1
|
da Silva AF, da Silva Neto AM, Aksenen C, Jeronimo P, Dezordi F, Almeida S, Costa H, Salvato R, Campos TD, Wallau G, of the Fiocruz Genomic Network OB. ViralFlow v1.0-a computational workflow for streamlining viral genomic surveillance. NAR Genom Bioinform 2024; 6:lqae056. [PMID: 38800829 PMCID: PMC11127631 DOI: 10.1093/nargab/lqae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 04/15/2024] [Accepted: 05/09/2024] [Indexed: 05/29/2024] Open
Abstract
ViralFlow v1.0 is a computational workflow developed for viral genomic surveillance. Several key changes turned ViralFlow into a general-purpose reference-based genome assembler for all viruses with an available reference genome. New virus-agnostic modules were implemented to further study nucleotide and amino acid mutations. ViralFlow v1.0 runs on a broad range of computational infrastructures, from laptop computers to high-performance computing (HPC) environments, and generates standard and well-formatted outputs suited for both public health reporting and scientific problem-solving. ViralFlow v1.0 is available at: https://viralflow.github.io/index-en.html.
Collapse
Affiliation(s)
- Alexandre Freitas da Silva
- Departamento de Entomologia, Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
- Núcleo de Bioinformática (NBI), Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
| | - Antonio Marinho da Silva Neto
- Data Analysis and Engineering, Genomic Surveillance Unit, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | | | | | - Filipe Zimmer Dezordi
- Departamento de Entomologia, Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
- Núcleo de Bioinformática (NBI), Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
| | | | - Hudson Marques Paula Costa
- Núcleo de Bioinformática (NBI), Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
| | - Richard Steiner Salvato
- Secretaria Estadual da Saúde do Rio Grande do Sul, Centro Estadual de Vigilância em Saúde, Laboratório Central de Saúde Pública, Porto Alegre, Rio Grande do Sul 90450-190, Brazil
| | - Tulio de Lima Campos
- Núcleo de Bioinformática (NBI), Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
| | - Gabriel da Luz Wallau
- Departamento de Entomologia, Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
- Núcleo de Bioinformática (NBI), Instituto Aggeu Magalhães (IAM)-Fundação Oswaldo Cruz-FIOCRUZ, Recife, Pernambuco 50670-420, Brazil
- Department of Arbovirology, Bernhard Nocht Institute for Tropical Medicine, WHO Collaborating Center for Arbovirus and Hemorrhagic Fever Reference and Research, National Reference Center for Tropical Infectious Diseases, Bernhard-Nocht-Strasse 74, D-20359 Hamburg, Germany
| | | |
Collapse
|
2
|
Duchen D, Clipman SJ, Vergara C, Thio CL, Thomas DL, Duggal P, Wojcik GL. A hepatitis B virus (HBV) sequence variation graph improves alignment and sample-specific consensus sequence construction. PLoS One 2024; 19:e0301069. [PMID: 38669259 PMCID: PMC11051683 DOI: 10.1371/journal.pone.0301069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 03/09/2024] [Indexed: 04/28/2024] Open
Abstract
Nearly 300 million individuals live with chronic hepatitis B virus (HBV) infection (CHB), for which no curative therapy is available. As viral diversity is associated with pathogenesis and immunological control of infection, improved methods to characterize this diversity could aid drug development efforts. Conventionally, viral sequencing data are mapped/aligned to a reference genome, and only the aligned sequences are retained for analysis. Thus, reference selection is critical, yet selecting the most representative reference a priori remains difficult. We investigate an alternative pangenome approach which can combine multiple reference sequences into a graph which can be used during alignment. Using simulated short-read sequencing data generated from publicly available HBV genomes and real sequencing data from an individual living with CHB, we demonstrate alignment to a phylogenetically representative 'genome graph' can improve alignment, avoid issues of reference ambiguity, and facilitate the construction of sample-specific consensus sequences more genetically similar to the individual's infection. Graph-based methods can, therefore, improve efforts to characterize the genetics of viral pathogens, including HBV, and have broader implications in host-pathogen research.
Collapse
Affiliation(s)
- Dylan Duchen
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States of America
- Center for Biomedical Data Science, Yale School of Medicine, New Haven, CT, United States of America
| | - Steven J. Clipman
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Candelaria Vergara
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States of America
| | - Chloe L. Thio
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - David L. Thomas
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Priya Duggal
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States of America
| | - Genevieve L. Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States of America
| |
Collapse
|
3
|
Ji D, Aboukhalil R, Moshiri N. ViralWasm: a client-side user-friendly web application suite for viral genomics. Bioinformatics 2024; 40:btae018. [PMID: 38200583 PMCID: PMC10809900 DOI: 10.1093/bioinformatics/btae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 01/09/2024] [Indexed: 01/12/2024] Open
Abstract
MOTIVATION The genomic surveillance of viral pathogens such as SARS-CoV-2 and HIV-1 has been critical to modern epidemiology and public health, but the use of sequence analysis pipelines requires computational expertise, and web-based platforms require sending potentially sensitive raw sequence data to remote servers. RESULTS We introduce ViralWasm, a user-friendly graphical web application suite for viral genomics. All ViralWasm tools utilize WebAssembly to execute the original command line tools client-side directly in the web browser without any user setup, with a cost of just 2-3x slowdown with respect to their command line counterparts. AVAILABILITY AND IMPLEMENTATION The ViralWasm tool suite can be accessed at: https://niema-lab.github.io/ViralWasm.
Collapse
Affiliation(s)
- Daniel Ji
- Department of Computer Science & Engineering, UC San Diego, La Jolla, CA 92093, United States
| | | | - Niema Moshiri
- Department of Computer Science & Engineering, UC San Diego, La Jolla, CA 92093, United States
| |
Collapse
|
4
|
Lim HGM, Fann YC, Lee YCG. COWID: an efficient cloud-based genomics workflow for scalable identification of SARS-COV-2. Brief Bioinform 2023; 24:bbad280. [PMID: 37738400 PMCID: PMC10516370 DOI: 10.1093/bib/bbad280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 07/15/2023] [Accepted: 07/19/2023] [Indexed: 09/24/2023] Open
Abstract
Implementing a specific cloud resource to analyze extensive genomic data on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) poses a challenge when resources are limited. To overcome this, we repurposed a cloud platform initially designed for use in research on cancer genomics (https://cgc.sbgenomics.com) to enable its use in research on SARS-CoV-2 to build Cloud Workflow for Viral and Variant Identification (COWID). COWID is a workflow based on the Common Workflow Language that realizes the full potential of sequencing technology for use in reliable SARS-CoV-2 identification and leverages cloud computing to achieve efficient parallelization. COWID outperformed other contemporary methods for identification by offering scalable identification and reliable variant findings with no false-positive results. COWID typically processed each sample of raw sequencing data within 5 min at a cost of only US$0.01. The COWID source code is publicly available (https://github.com/hendrick0403/COWID) and can be accessed on any computer with Internet access. COWID is designed to be user-friendly; it can be implemented without prior programming knowledge. Therefore, COWID is a time-efficient tool that can be used during a pandemic.
Collapse
Affiliation(s)
- Hendrick Gao-Min Lim
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan 11031
- Department of Medical Research, Tzu Chi Hospital Indonesia, Pantai Indah Kapuk, Greater Jakarta, Indonesia 14470
| | - Yang C Fann
- IT and Bioinformatics Program, Division of Intramural, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA 20892
| | - Yuan-Chii Gladys Lee
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan 11031
| |
Collapse
|
5
|
Duchen D, Clipman S, Vergara C, Thio CL, Thomas DL, Duggal P, Wojcik GL. A hepatitis B virus (HBV) sequence variation graph improves sequence alignment and sample-specific consensus sequence construction for genetic analysis of HBV. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.11.523611. [PMID: 36711598 PMCID: PMC9882026 DOI: 10.1101/2023.01.11.523611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Hepatitis B virus (HBV) remains a global public health concern, with over 250 million individuals living with chronic HBV infection (CHB) and no curative therapy currently available. Viral diversity is associated with CHB pathogenesis and immunological control of infection. Improved methods to characterize the viral genome at both the population and intra-host level could aid drug development efforts. Conventionally, HBV sequencing data are aligned to a linear reference genome and only sequences capable of aligning to the reference are captured for analysis. Reference selection has additional consequences, including sample-specific 'consensus' sequence construction. It remains unclear how to select a reference from available sequences and whether a single reference is sufficient for genetic analyses. Using simulated short-read sequencing data generated from full-length publicly available HBV genome sequences and HBV sequencing data from a longitudinally sampled individual with CHB, we investigate alternative graph-based alignment approaches. We demonstrate that using a phylogenetically representative 'genome graph' for alignment, rather than linear reference sequences, avoids issues of reference ambiguity, improves alignment, and facilitates the construction of sample-specific consensus sequences genetically similar to an individual's infection. Graph-based methods can therefore improve efforts to characterize the genetics of viral pathogens, including HBV, and may have broad implications in host pathogen research.
Collapse
Affiliation(s)
- Dylan Duchen
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Steven Clipman
- Division of Infectious Diseases, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Candelaria Vergara
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Chloe L Thio
- Division of Infectious Diseases, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - David L Thomas
- Division of Infectious Diseases, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Priya Duggal
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| |
Collapse
|
6
|
Váradi A, Kaszab E, Kardos G, Prépost E, Szarka K, Laczkó L. Rapid genotyping of targeted viral samples using Illumina short-read sequencing data. PLoS One 2022; 17:e0274414. [PMID: 36112576 PMCID: PMC9481040 DOI: 10.1371/journal.pone.0274414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/30/2022] [Indexed: 11/19/2022] Open
Abstract
The most important information about microorganisms might be their accurate genome sequence. Using current Next Generation Sequencing methods, sequencing data can be generated at an unprecedented pace. However, we still lack tools for the automated and accurate reference-based genotyping of viral sequencing reads. This paper presents our pipeline designed to reconstruct the dominant consensus genome of viral samples and analyze their within-host variability. We benchmarked our approach on numerous datasets and showed that the consensus genome of samples could be obtained reliably without further manual data curation. Our pipeline can be a valuable tool for fast identifying viral samples. The pipeline is publicly available on the project’s GitHub page (https://github.com/laczkol/QVG).
Collapse
Affiliation(s)
- Alex Váradi
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
- Department of Laboratory Medicine, University of Pécs, Pécs, Hungary
| | - Eszter Kaszab
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
- Veterinary Medical Research Institute, Budapest, Hungary
| | - Gábor Kardos
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - Eszter Prépost
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - Krisztina Szarka
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - Levente Laczkó
- Department of Metagenomics, University of Debrecen, Debrecen, Hungary
- ELKH-DE Conservation Biology Research Group, Debrecen, Hungary
- * E-mail:
| |
Collapse
|