Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Maarala AI, Bzhalava Z, Dillner J, Heljanko K, Bzhalava D. ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads. Bioinformatics 2018;34:928-935. [PMID: 29106455 DOI: 10.1093/bioinformatics/btx702] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 11/01/2017] [Indexed: 11/13/2022] Open

For:	Maarala AI, Bzhalava Z, Dillner J, Heljanko K, Bzhalava D. ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads. Bioinformatics 2018;34:928-935. [PMID: 29106455 DOI: 10.1093/bioinformatics/btx702] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Accepted: 11/01/2017] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Merino Martinez R, Müller H, Negru S, Ormenisan A, Arroyo Mühr LS, Zhang X, Trier Møller F, Clements MS, Kozlakidis Z, Pimenoff VN, Wilkowski B, Boeckhout M, Öhman H, Chong S, Holzinger A, Lehtinen M, van Veen EB, Bała P, Widschwendter M, Dowling J, Törnroos J, Snyder MP, Dillner J. Human exposome assessment platform. Environ Epidemiol 2021;5:e182. [PMID: 34909561 PMCID: PMC8663864 DOI: 10.1097/ee9.0000000000000182] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 11/14/2021] [Indexed: 11/26/2022] Open

Fuentes-Trillo A, Monzó C, Manzano I, Santiso-Bellón C, Andrade JDSRD, Gozalbo-Rovira R, García-García AB, Rodríguez-Díaz J, Chaves FJ. Benchmarking different approaches for Norovirus genome assembly in metagenome samples. BMC Genomics 2021;22:849. [PMID: 34819031 PMCID: PMC8611953 DOI: 10.1186/s12864-021-08067-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 10/10/2021] [Indexed: 12/22/2022] Open

Maarala AI, Arasalo O, Valenzuela D, Mäkinen V, Heljanko K. Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment. PLoS One 2021;16:e0255260. [PMID: 34343181 PMCID: PMC8330939 DOI: 10.1371/journal.pone.0255260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 07/12/2021] [Indexed: 11/19/2022] Open

Abstract

Computational pan-genomics utilizes information from multiple individual genomes in large-scale comparative analysis. Genetic variation between case-controls, ethnic groups, or species can be discovered thoroughly using pan-genomes of such subpopulations. Whole-genome sequencing (WGS) data volumes are growing rapidly, making genomic data compression and indexing methods very important. Despite current space-efficient repetitive sequence compression and indexing methods, the deployed compression methods are often sequential, computationally time-consuming, and do not provide efficient sequence alignment performance on vast collections of genomes such as pan-genomes. For performing rapid analytics with the ever-growing genomics data, data compression and indexing methods have to exploit distributed and parallel computing more efficiently. Instead of strict genome data compression methods, we will focus on the efficient construction of a compressed index for pan-genomes. Compressed hybrid-index enables fast sequence alignments to several genomes at once while shrinking the index size significantly compared to traditional indexes. We propose a scalable distributed compressed hybrid-indexing method for large genomic data sets enabling pan-genome-based sequence search and read alignment capabilities. We show the scalability of our tool, DHPGIndex, by executing experiments in a distributed Apache Spark-based computing cluster comprising 448 cores distributed over 26 nodes. The experiments have been performed both with human and bacterial genomes. DHPGIndex built a BLAST index for n = 250 human pan-genome with an 870:1 compression ratio (CR) in 342 minutes and a Bowtie2 index with 157:1 CR in 397 minutes. For n = 1,000 human pan-genome, the BLAST index was built in 1520 minutes with 532:1 CR and the Bowtie2 index in 1938 minutes with 76:1 CR. Bowtie2 aligned 14.6 GB of paired-end reads to the compressed (n = 1,000) index in 31.7 minutes on a single node. Compressing n = 13,375,031 (488 GB) GenBank database to BLAST index resulted in CR of 62:1 in 575 minutes. BLASTing 189,864 Crispr-Cas9 gRNA target sequences (23 MB in total) to the compressed index of human pan-genome (n = 1,000) finished in 45 minutes on a single node. 30 MB mixed bacterial sequences were (n = 599) were blasted to the compressed index of 488 GB GenBank database (n = 13,375,031) in 26 minutes on 25 nodes. 78 MB mixed sequences (n = 4,167) were blasted to the compressed index of 18 GB E. coli sequence database (n = 745,409) in 5.4 minutes on a single node.

Collapse

Posada-Céspedes S, Seifert D, Topolsky I, Jablonski KP, Metzner KJ, Beerenwinkel N. V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data. Bioinformatics 2021;37:1673-1680. [PMID: 33471068 PMCID: PMC8289377 DOI: 10.1093/bioinformatics/btab015] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 12/09/2020] [Accepted: 01/08/2021] [Indexed: 12/30/2022] Open

Abstract

Motivation

High-throughput sequencing technologies are used increasingly not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations.

Results

To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape.

Availabilityand implementation

V-pipe is freely available at https://github.com/cbg-ethz/V-pipe.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Krissaane I, De Niz C, Gutiérrez-Sacristán A, Korodi G, Ede N, Kumar R, Lyons J, Manrai A, Patel C, Kohane I, Avillach P. Scalability and cost-effectiveness analysis of whole genome-wide association studies on Google Cloud Platform and Amazon Web Services. J Am Med Inform Assoc 2020;27:1425-1430. [PMID: 32719837 PMCID: PMC7534581 DOI: 10.1093/jamia/ocaa068] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 03/20/2020] [Accepted: 04/17/2020] [Indexed: 01/14/2023] Open

Transcription of human papillomavirus oncogenes in head and neck squamous cell carcinomas. Vaccine 2020;38:4066-4070. [PMID: 32362526 DOI: 10.1016/j.vaccine.2020.04.049] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 04/03/2020] [Accepted: 04/20/2020] [Indexed: 12/24/2022]

Pérez-Losada M, Arenas M, Galán JC, Bracho MA, Hillung J, García-González N, González-Candelas F. High-throughput sequencing (HTS) for the analysis of viral populations. INFECTION GENETICS AND EVOLUTION 2020;80:104208. [PMID: 32001386 DOI: 10.1016/j.meegid.2020.104208] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 12/12/2022]

Maabar M, Davison AJ, Vučak M, Thorburn F, Murcia PR, Gunson R, Palmarini M, Hughes J. DisCVR: Rapid viral diagnosis from high-throughput sequencing data. Virus Evol 2019;5:vez033. [PMID: 31528358 PMCID: PMC6735924 DOI: 10.1093/ve/vez033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open