101
|
Vembar SS, Seetin M, Lambert C, Nattestad M, Schatz MC, Baybayan P, Scherf A, Smith ML. Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11 kb), single molecule, real-time sequencing. DNA Res 2016; 23:339-51. [PMID: 27345719 PMCID: PMC4991835 DOI: 10.1093/dnares/dsw022] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Accepted: 05/10/2016] [Indexed: 01/03/2023] Open
Abstract
The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [∼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [∼90–99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission.
Collapse
Affiliation(s)
- Shruthi Sridhar Vembar
- Unité Biologie des Interactions Hôte-Parasite, Département de Parasites et Insectes Vecteurs, Institut Pasteur, Paris 75015, France CNRS, ERL 9195, Paris 75015, France INSERM, Unit U1201, Paris 75015, France
| | | | | | | | - Michael C Schatz
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | | | - Artur Scherf
- Unité Biologie des Interactions Hôte-Parasite, Département de Parasites et Insectes Vecteurs, Institut Pasteur, Paris 75015, France CNRS, ERL 9195, Paris 75015, France INSERM, Unit U1201, Paris 75015, France
| | | |
Collapse
|
102
|
Zukurov JP, do Nascimento-Brito S, Volpini AC, Oliveira GC, Janini LMR, Antoneli F. Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage. Algorithms Mol Biol 2016; 11:2. [PMID: 26973707 PMCID: PMC4788855 DOI: 10.1186/s13015-016-0064-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 02/25/2016] [Indexed: 12/16/2022] Open
Abstract
Background In this paper we propose a method and discuss its computational implementation as an integrated tool for the analysis of viral genetic diversity on data generated by high-throughput sequencing. The main motivation for this work is to better understand the genetic diversity of viruses with high rates of nucleotide substitution, as HIV-1 and Influenza. Most methods for viral diversity estimation proposed so far are intended to take benefit of the longer reads produced by some next-generation sequencing platforms in order to estimate a population of haplotypes which represent the diversity of the original population. The method proposed here is custom-made to take advantage of the very low error rate and extremely deep coverage per site, which are the main features of some neglected technologies that have not received much attention due to the short length of its reads, which precludes haplotype estimation. This approach allowed us to avoid some hard problems related to haplotype reconstruction (need of long reads, preliminary error filtering and assembly). Results We propose to measure genetic diversity of a viral population through a family of multinomial probability distributions indexed by the sites of the virus genome, each one representing the distribution of nucleic bases per site. Moreover, the implementation of the method focuses on two main optimization strategies: a read mapping/alignment procedure that aims at the recovery of the maximum possible number of short-reads; the inference of the multinomial parameters in a Bayesian framework with smoothed Dirichlet estimation. The Bayesian approach provides conditional probability distributions for the multinomial parameters allowing one to take into account the prior information of the control experiment and providing a natural way to separate signal from noise, since it automatically furnishes Bayesian confidence intervals and thus avoids the drawbacks of preliminary error filtering. Conclusions The methods described in this paper have been implemented as an integrated tool called Tanden (Tool for Analysis of Diversity in Viral Populations) and successfully tested on samples obtained from HIV-1 strain NL4-3 (group M, subtype B) cultivations on primary human cell cultures in many distinct viral propagation conditions. Tanden is written in C# (Microsoft), runs on the Windows operating system, and can be downloaded from: http://tanden.url.ph/.
Collapse
|
103
|
Abstract
Diversity, evolution, and epidemiology of HIV are directly relevant to HIV transmission and pathogenesis; hence, they play a key role in antiretroviral treatment and vaccine design. Global HIV whole-genome sequencing would provide a treasure chest of data to answer many questions still open in these fields. An article by Berg et al. in this issue of theJournal of Clinical Microbiologydescribes a universal strategy to amplify and sequence heterogeneous HIV whole genomes (M. G. Berg, J. Yamaguchi, E. Alessandri-Gradt, R. W. Tell, J.-C. Plantier, and C. A. Brennan, J Clin Microbiol 54:868-882, 2016,http://dx.doi.org/10.1128/JCM.02479-15).
Collapse
|
104
|
St. John EP, Simen BB, Turenchalk GS, Braverman MS, Abbate I, Aerssens J, Bouchez O, Gabriel C, Izopet J, Meixenberger K, Di Giallonardo F, Schlapbach R, Paredes R, Sakwa J, Schmitz-Agheguian GG, Thielen A, Victor M, Metzner KJ, Däumer MP. A Follow-Up of the Multicenter Collaborative Study on HIV-1 Drug Resistance and Tropism Testing Using 454 Ultra Deep Pyrosequencing. PLoS One 2016; 11:e0146687. [PMID: 26756901 PMCID: PMC4710461 DOI: 10.1371/journal.pone.0146687] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 12/21/2015] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Ultra deep sequencing is of increasing use not only in research but also in diagnostics. For implementation of ultra deep sequencing assays in clinical laboratories for routine diagnostics, intra- and inter-laboratory testing are of the utmost importance. METHODS A multicenter study was conducted to validate an updated assay design for 454 Life Sciences' GS FLX Titanium system targeting protease/reverse transcriptase (RTP) and env (V3) regions to identify HIV-1 drug-resistance mutations and determine co-receptor use with high sensitivity. The study included 30 HIV-1 subtype B and 6 subtype non-B samples with viral titers (VT) of 3,940-447,400 copies/mL, two dilution series (52,129-1,340 and 25,130-734 copies/mL), and triplicate samples. Amplicons spanning PR codons 10-99, RT codons 1-251 and the entire V3 region were generated using barcoded primers. Analysis was performed using the GS Amplicon Variant Analyzer and geno2pheno for tropism. For comparison, population sequencing was performed using the ViroSeq HIV-1 genotyping system. RESULTS The median sequencing depth across the 11 sites was 1,829 reads per position for RTP (IQR 592-3,488) and 2,410 for V3 (IQR 786-3,695). 10 preselected drug resistant variants were measured across sites and showed high inter-laboratory correlation across all sites with data (P<0.001). The triplicate samples of a plasmid mixture confirmed the high inter-laboratory consistency (mean% ± stdev: 4.6 ±0.5, 4.8 ±0.4, 4.9 ±0.3) and revealed good intra-laboratory consistency (mean% range ± stdev range: 4.2-5.2 ± 0.04-0.65). In the two dilutions series, no variants >20% were missed, variants 2-10% were detected at most sites (even at low VT), and variants 1-2% were detected by some sites. All mutations detected by population sequencing were also detected by UDS. CONCLUSIONS This assay design results in an accurate and reproducible approach to analyze HIV-1 mutant spectra, even at variant frequencies well below those routinely detectable by population sequencing.
Collapse
Affiliation(s)
| | - Birgitte B. Simen
- 454 Life Sciences, A Roche Company, Branford, CT, United States of America
| | | | | | - Isabella Abbate
- National Institute for Infectious Diseases “L. Spallanzani, Rome, Italy
| | - Jeroen Aerssens
- Janssen Infectious Diseases—Diagnostics bvba, Beerse, Belgium
| | - Olivier Bouchez
- Plateforme Génomique Toulouse/Laboratoire Génétique Cellulaire, Toulouse, France
| | | | | | | | - Francesca Di Giallonardo
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Ralph Schlapbach
- Functional Genomics Center Zurich, University of Zurich, ETH Zurich, Zurich, Switzerland
| | - Roger Paredes
- Institut de Recerca de la SIDA–IrsiCaixa, Badalona, Spain
| | - James Sakwa
- Technology Innovation Agency-National Genomics Platform, Durban, South Africa
| | | | | | | | - Karin J. Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, Zurich, Switzerland
- * E-mail:
| | | | | |
Collapse
|
105
|
Bellecave P, Recordon-Pinson P, Fleury H. Evaluation of Automatic Analysis of Ultradeep Pyrosequencing Raw Data to Determine Percentages of HIV Resistance Mutations in Patients Followed-Up in Hospital. AIDS Res Hum Retroviruses 2016; 32:85-92. [PMID: 26529549 DOI: 10.1089/aid.2015.0201] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
A major obstacle to using next generation sequencing (NGS) technology in clinical routine practice is reliable data analysis. Thousands of sequences need to be aligned and validated, to exclude sequencing artifacts and generate accurate results. We compared two analysis pipelines for Roche 454 ultradeep pyrosequencing (UDPS) raw data generated from HIV-1 clinical samples: a commercial and fully automated Web-based software NGS HIV-1 Module (SmartGene, Zug, Switzerland) vs. the Amplicon Variant Analyzer software (AVA, 454 Life Sciences; Roche). Results were also compared to those obtained with Sanger sequencing. HIV-1 reverse transcriptase and protease genes from 34 plasma samples were submitted to Sanger sequencing and GS Junior UDPS. Raw UDPS data (sff files) from all samples were analyzed with AVA 2.7 software plus manual review of the alignments and the fully automated SmartGene NGS HIV-1 Module prototype (SMG). Results obtained with both analysis pipelines showed good correlation (85.0%). Divergent results were mainly observed at homopolymer positions, such as K101, where the frame-aware alignment and error corrections of the automated approach were more efficient and more accurate, both in terms of detecting and quantifying drug resistance mutations. Our study shows that NGS data can easily be analyzed via a fully automated analysis pipeline, here the SmartGene NGS HIV-1 Module, thus minimizing the need for manual review of alignments by the user, otherwise essential to ensure accurate results. Such automated analysis pipelines may facilitate the adoption of NGS platforms in the routine clinical laboratory.
Collapse
Affiliation(s)
- Pantxika Bellecave
- CNRS-UMR 5234, Microbiologie Fondamentale et Pathogénicité, Université Bordeaux Segalen, Bordeaux, France
- Centre Hospitalier Universitaire de Bordeaux (CHU), Laboratoire de Virologie, Bordeaux, France
| | - Patricia Recordon-Pinson
- CNRS-UMR 5234, Microbiologie Fondamentale et Pathogénicité, Université Bordeaux Segalen, Bordeaux, France
- Centre Hospitalier Universitaire de Bordeaux (CHU), Laboratoire de Virologie, Bordeaux, France
| | - Hervé Fleury
- CNRS-UMR 5234, Microbiologie Fondamentale et Pathogénicité, Université Bordeaux Segalen, Bordeaux, France
- Centre Hospitalier Universitaire de Bordeaux (CHU), Laboratoire de Virologie, Bordeaux, France
| |
Collapse
|
106
|
A Comprehensive Analysis of Primer IDs to Study Heterogeneous HIV-1 Populations. J Mol Biol 2015; 428:238-250. [PMID: 26711506 DOI: 10.1016/j.jmb.2015.12.012] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Revised: 11/25/2015] [Accepted: 12/16/2015] [Indexed: 01/01/2023]
Abstract
Determining the composition of viral populations is becoming increasingly important in the field of medical virology. While recently developed computational tools for viral haplotype analysis allow for correcting sequencing errors, they do not always allow for the removal of errors occurring in the upstream experimental protocol, such as PCR errors. Primer IDs (pIDs) are one method to address this problem by harnessing redundant template resampling for error correction. By using a reference mixture of five HIV-1 strains, we show how pIDs can be useful for estimating key experimental parameters, such as the substitution rate of the PCR process and the reverse transcription (RT) error rate. In addition, we introduce a hidden Markov model for determining the recombination rate of the RT PCR process. We found no strong sequence-specific bias in pID abundances (the same RT efficiencies as compared to commonly used short, specific RT primers) and no effects of pIDs on the estimated distribution of the references viruses.
Collapse
|
107
|
Jayasundara D, Saeed I, Chang BC, Tang SL, Halgamuge SK. Accurate reconstruction of viral quasispecies spectra through improved estimation of strain richness. BMC Bioinformatics 2015; 16 Suppl 18:S3. [PMID: 26678073 PMCID: PMC4682401 DOI: 10.1186/1471-2105-16-s18-s3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Estimating the number of different species (richness) in a mixed microbial population has been a main focus in metagenomic research. Existing methods of species richness estimation ride on the assumption that the reads in each assembled contig correspond to only one of the microbial genomes in the population. This assumption and the underlying probabilistic formulations of existing methods are not useful for quasispecies populations where the strains are highly genetically related. RESULTS On benchmark data sets, our estimation method provided accurate richness estimates (< 0.2 median estimation error) and improved the precision of ViQuaS by 2%-13% and F-score by 1%-9% without compromising the recall rates. We also demonstrate that our estimation method can be used to improve the precision and F-score of ShoRAH by 0%-7% and 0%-5% respectively. CONCLUSIONS The proposed probabilistic estimation method can be used to estimate the richness of viral populations with a quasispecies behavior and to improve the accuracy of the quasispecies spectra reconstructed by the existing methods ViQuaS and ShoRAH in the presence of a moderate level of technical sequencing errors. AVAILABILITY http://sourceforge.net/projects/viquas/.
Collapse
Affiliation(s)
- Duleepa Jayasundara
- Optimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of Melbourne, VIC 3010, Parkville, Australia
| | - I Saeed
- Optimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of Melbourne, VIC 3010, Parkville, Australia
| | - BC Chang
- Yourgene Bioscience, No. 376-5, Fuxing Rd., Shu-Lin District, New Taipei City, Taiwan
| | - Sen-Lin Tang
- Biodiversity Research Center, Academia Sinica, Taipei 11529, Nan-Kang, Taiwan
| | - Saman K Halgamuge
- Optimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of Melbourne, VIC 3010, Parkville, Australia
| |
Collapse
|
108
|
Luk KC, Berg MG, Naccache SN, Kabre B, Federman S, Mbanya D, Kaptué L, Chiu CY, Brennan CA, Hackett J. Utility of Metagenomic Next-Generation Sequencing for Characterization of HIV and Human Pegivirus Diversity. PLoS One 2015; 10:e0141723. [PMID: 26599538 PMCID: PMC4658132 DOI: 10.1371/journal.pone.0141723] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 10/12/2015] [Indexed: 02/06/2023] Open
Abstract
Given the dynamic changes in HIV-1 complexity and diversity, next-generation sequencing (NGS) has the potential to revolutionize strategies for effective HIV global surveillance. In this study, we explore the utility of metagenomic NGS to characterize divergent strains of HIV-1 and to simultaneously screen for other co-infecting viruses. Thirty-five HIV-1-infected Cameroonian blood donor specimens with viral loads of >4.4 log10 copies/ml were selected to include a diverse representation of group M strains. Random-primed NGS libraries, prepared from plasma specimens, resulted in greater than 90% genome coverage for 88% of specimens. Correct subtype designations based on NGS were concordant with sub-region PCR data in 31 of 35 (89%) cases. Complete genomes were assembled for 25 strains, including circulating recombinant forms with relatively limited data available (7 CRF11_cpx, 2 CRF13_cpx, 1 CRF18_cpx, and 1 CRF37_cpx), as well as 9 unique recombinant forms. HPgV (formerly designated GBV-C) co-infection was detected in 9 of 35 (25%) specimens, of which eight specimens yielded complete genomes. The recovered HPgV genomes formed a diverse cluster with genotype 1 sequences previously reported from Ghana, Uganda, and Japan. The extensive genome coverage obtained by NGS improved accuracy and confidence in phylogenetic classification of the HIV-1 strains present in the study population relative to conventional sub-region PCR. In addition, these data demonstrate the potential for metagenomic analysis to be used for routine characterization of HIV-1 and identification of other viral co-infections.
Collapse
Affiliation(s)
- Ka-Cheung Luk
- Abbott Diagnostics, Infectious Disease Research, Abbott Park, Illinois, United States of America
| | - Michael G Berg
- Abbott Diagnostics, Infectious Disease Research, Abbott Park, Illinois, United States of America
| | - Samia N Naccache
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, California, United States of America.,UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California, United States of America
| | - Beniwende Kabre
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, California, United States of America.,UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California, United States of America
| | - Scot Federman
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, California, United States of America.,UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California, United States of America
| | | | | | - Charles Y Chiu
- Department of Laboratory Medicine, University of California San Francisco, San Francisco, California, United States of America.,UCSF-Abbott Viral Diagnostics and Discovery Center, San Francisco, California, United States of America.,Department of Medicine, Division of Infectious Diseases, University of California San Francisco, San Francisco, California, United States of America
| | - Catherine A Brennan
- Abbott Diagnostics, Infectious Disease Research, Abbott Park, Illinois, United States of America
| | - John Hackett
- Abbott Diagnostics, Infectious Disease Research, Abbott Park, Illinois, United States of America
| |
Collapse
|
109
|
Ode H, Matsuda M, Matsuoka K, Hachiya A, Hattori J, Kito Y, Yokomaku Y, Iwatani Y, Sugiura W. Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq. Front Microbiol 2015; 6:1258. [PMID: 26617593 PMCID: PMC4641896 DOI: 10.3389/fmicb.2015.01258] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Accepted: 10/29/2015] [Indexed: 12/29/2022] Open
Abstract
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome.
Collapse
Affiliation(s)
- Hirotaka Ode
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Masakazu Matsuda
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Kazuhiro Matsuoka
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Atsuko Hachiya
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Junko Hattori
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Yumiko Kito
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Yoshiyuki Yokomaku
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Yasumasa Iwatani
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan ; Department of AIDS Research, Graduate School of Medicine, Nagoya University Nagoya, Japan
| | - Wataru Sugiura
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan ; Department of AIDS Research, Graduate School of Medicine, Nagoya University Nagoya, Japan
| |
Collapse
|
110
|
Van der Borght K, Thys K, Wetzels Y, Clement L, Verbist B, Reumers J, van Vlijmen H, Aerssens J. QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles. BMC Bioinformatics 2015; 16:379. [PMID: 26554718 PMCID: PMC4641353 DOI: 10.1186/s12859-015-0812-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 10/31/2015] [Indexed: 12/03/2022] Open
Abstract
Background Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth (“deep sequencing”), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. Results For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNVD). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNVHS). To also increase specificity, SNVs called were overruled when their frequency was below the 80th percentile calculated on the distribution of error frequencies (QQ-SNVHS-P80). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNVD performed similarly to the existing approaches. QQ-SNVHS was more sensitive on all test sets but with more false positives. QQ-SNVHS-P80 was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5 %, QQ-SNVHS-P80 revealed a sensitivity of 100 % (vs. 40–60 % for the existing methods) and a specificity of 100 % (vs. 98.0–99.7 % for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5 % were consistently detected by QQ-SNVHS-P80 from different generations of Illumina sequencers. Conclusions We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0812-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Koen Van der Borght
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium. .,Interuniversity Institute for Biostatistics and statistical Bioinformatics, Katholieke Universiteit Leuven, B-3000, Leuven, Belgium.
| | - Kim Thys
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.
| | - Yves Wetzels
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.
| | - Lieven Clement
- Ghent University, Applied Mathematics, Informatics and Statistics, B-9000, Ghent, Belgium.
| | - Bie Verbist
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.
| | - Joke Reumers
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.
| | | | - Jeroen Aerssens
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.
| |
Collapse
|
111
|
Wu SH, Rodrigo AG. Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals. BMC Bioinformatics 2015; 16:357. [PMID: 26536860 PMCID: PMC4634753 DOI: 10.1186/s12859-015-0810-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 10/30/2015] [Indexed: 11/17/2022] Open
Abstract
Background Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled or untagged individuals, especially when the reconstruction of full length haplotypes can be unreliable. We propose two novel approaches, least squares estimation (LS) and Approximate Bayesian Computation Markov chain Monte Carlo estimation (ABC-MCMC), to infer evolutionary genetic parameters from a collection of short-read sequences obtained from a mixed sample of anonymous DNA using the frequencies of nucleotides at each site only without reconstructing the full-length alignment nor the phylogeny. Results We used simulations to evaluate the performance of these algorithms, and our results demonstrate that LS performs poorly because bootstrap 95 % Confidence Intervals (CIs) tend to under- or over-estimate the true values of the parameters. In contrast, ABC-MCMC 95 % Highest Posterior Density (HPD) intervals recovered from ABC-MCMC enclosed the true parameter values with a rate approximately equivalent to that obtained using BEAST, a program that implements a Bayesian MCMC estimation of evolutionary parameters using full-length sequences. Because there is a loss of information with the use of sitewise nucleotide frequencies alone, the ABC-MCMC 95 % HPDs are larger than those obtained by BEAST. Conclusion We propose two novel algorithms to estimate evolutionary genetic parameters based on the proportion of each nucleotide. The LS method cannot be recommended as a standalone method for evolutionary parameter estimation. On the other hand, parameters recovered by ABC-MCMC are comparable to those obtained using BEAST, but with larger 95 % HPDs. One major advantage of ABC-MCMC is that computational time scales linearly with the number of short-read sequences, and is independent of the number of full-length sequences in the original data. This allows us to perform the analysis on NGS datasets with large numbers of short read fragments. The source code for ABC-MCMC is available at https://github.com/stevenhwu/SF-ABC.
Collapse
Affiliation(s)
- Steven H Wu
- Biodesign Institute, Arizona State University, Tempe, AZ, 85287, USA. .,Department of Biology, Duke University, Box 90338, Durham, NC, 27708, USA.
| | - Allen G Rodrigo
- Department of Biology, Duke University, Box 90338, Durham, NC, 27708, USA. .,The National Evolutionary Synthesis Center, Durham, NC, 27705, USA.
| |
Collapse
|
112
|
Chedom DF, Murcia PR, Greenman CD. Inferring the Clonal Structure of Viral Populations from Time Series Sequencing. PLoS Comput Biol 2015; 11:e1004344. [PMID: 26571026 PMCID: PMC4646700 DOI: 10.1371/journal.pcbi.1004344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Accepted: 05/17/2015] [Indexed: 11/18/2022] Open
Abstract
RNA virus populations will undergo processes of mutation and selection resulting in a mixed population of viral particles. High throughput sequencing of a viral population subsequently contains a mixed signal of the underlying clones. We would like to identify the underlying evolutionary structures. We utilize two sources of information to attempt this; within segment linkage information, and mutation prevalence. We demonstrate that clone haplotypes, their prevalence, and maximum parsimony reticulate evolutionary structures can be identified, although the solutions may not be unique, even for complete sets of information. This is applied to a chain of influenza infection, where we infer evolutionary structures, including reassortment, and demonstrate some of the difficulties of interpretation that arise from deep sequencing due to artifacts such as template switching during PCR amplification.
Collapse
Affiliation(s)
- Donatien F. Chedom
- The Genome Analysis Centre, Norwich Research Park, Norwich, United Kingdom
| | - Pablo R. Murcia
- MRC-University of Glasgow Centre for Virus Research, United Kingdom
| | - Chris D. Greenman
- The Genome Analysis Centre, Norwich Research Park, Norwich, United Kingdom
- School of Computing Sciences, University of East Anglia, Norwich, United Kingdom
| |
Collapse
|
113
|
Fahnøe U, Pedersen AG, Dräger C, Orton RJ, Blome S, Höper D, Beer M, Rasmussen TB. Creation of Functional Viruses from Non-Functional cDNA Clones Obtained from an RNA Virus Population by the Use of Ancestral Reconstruction. PLoS One 2015; 10:e0140912. [PMID: 26485566 PMCID: PMC4613144 DOI: 10.1371/journal.pone.0140912] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 10/01/2015] [Indexed: 02/05/2023] Open
Abstract
RNA viruses have the highest known mutation rates. Consequently it is likely that a high proportion of individual RNA virus genomes, isolated from an infected host, will contain lethal mutations and be non-functional. This is problematic if the aim is to clone and investigate high-fitness, functional cDNAs and may also pose problems for sequence-based analysis of viral evolution. To address these challenges we have performed a study of the evolution of classical swine fever virus (CSFV) using deep sequencing and analysis of 84 full-length cDNA clones, each representing individual genomes from a moderately virulent isolate. In addition to here being used as a model for RNA viruses generally, CSFV has high socioeconomic importance and remains a threat to animal welfare and pig production. We find that the majority of the investigated genomes are non-functional and only 12% produced infectious RNA transcripts. Full length sequencing of cDNA clones and deep sequencing of the parental population identified substitutions important for the observed phenotypes. The investigated cDNA clones were furthermore used as the basis for inferring the sequence of functional viruses. Since each unique clone must necessarily be the descendant of a functional ancestor, we hypothesized that it should be possible to produce functional clones by reconstructing ancestral sequences. To test this we used phylogenetic methods to infer two ancestral sequences, which were then reconstructed as cDNA clones. Viruses rescued from the reconstructed cDNAs were tested in cell culture and pigs. Both reconstructed ancestral genomes proved functional, and displayed distinct phenotypes in vitro and in vivo. We suggest that reconstruction of ancestral viruses is a useful tool for experimental and computational investigations of virulence and viral evolution. Importantly, ancestral reconstruction can be done even on the basis of a set of sequences that all correspond to non-functional variants.
Collapse
Affiliation(s)
- Ulrik Fahnøe
- DTU National Veterinary Institute, Technical University of Denmark, Lindholm, Kalvehave, Denmark
- Center for Biological Sequence Analysis, DTU Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Anders Gorm Pedersen
- Center for Biological Sequence Analysis, DTU Systems Biology, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Carolin Dräger
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany
| | - Richard J Orton
- Institute of Biodiversity, Animal Health, and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
- MRC–University of Glasgow Centre for Virus Research, Institute of Infection, Inflammation and Immunity, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom
| | - Sandra Blome
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany
| | - Dirk Höper
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany
| | - Martin Beer
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut, Greifswald-Insel Riems, Germany
| | - Thomas Bruun Rasmussen
- DTU National Veterinary Institute, Technical University of Denmark, Lindholm, Kalvehave, Denmark
- * E-mail:
| |
Collapse
|
114
|
Cijvat R, Manegold S, Kersten M, Klau GW, Schönhuth A, Marschall T, Zhang Y. Genome sequence analysis with MonetDB. ACTA ACUST UNITED AC 2015. [DOI: 10.1007/s13222-015-0198-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
115
|
HIV-1 genotypic drug resistance testing: digging deep, reaching wide? Curr Opin Virol 2015; 14:16-23. [DOI: 10.1016/j.coviro.2015.06.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2015] [Revised: 06/10/2015] [Accepted: 06/10/2015] [Indexed: 12/26/2022]
|
116
|
Routh A, Chang MW, Okulicz JF, Johnson JE, Torbett BE. CoVaMa: Co-Variation Mapper for disequilibrium analysis of mutant loci in viral populations using next-generation sequence data. Methods 2015; 91:40-47. [PMID: 26408523 DOI: 10.1016/j.ymeth.2015.09.021] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Revised: 09/18/2015] [Accepted: 09/21/2015] [Indexed: 11/29/2022] Open
Abstract
Next-Generation Sequencing (NGS) has transformed our understanding of the dynamics and diversity of virus populations for human pathogens and model systems alike. Due to the sensitivity and depth of coverage in NGS, it is possible to measure the frequency of mutations that may be present even at vanishingly low frequencies within the viral population. Here, we describe a simple bioinformatic pipeline called CoVaMa (Co-Variation Mapper) scripted in Python that detects correlated patterns of mutations in a viral sample. Our algorithm takes NGS alignment data and populates large matrices of contingency tables that correspond to every possible pairwise interaction of nucleotides in the viral genome or amino acids in the chosen open reading frame. These tables are then analysed using classical linkage disequilibrium to detect and report evidence of epistasis. We test our analysis with simulated data and then apply the approach to find epistatically linked loci in Flock House Virus genomic RNA grown under controlled cell culture conditions. We also reanalyze NGS data from a large cohort of HIV infected patients and find correlated amino acid substitution events in the protease gene that have arisen in response to anti-viral therapy. This both confirms previous findings and suggests new pairs of interactions within HIV protease. The script is publically available at http://sourceforge.net/projects/covama.
Collapse
Affiliation(s)
- Andrew Routh
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA; Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA; Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, USA.
| | - Max W Chang
- Integrative Genomics and Bioinformatics Core, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Jason F Okulicz
- Infectious Disease Service, San Antonio Military Medical Center, Fort Sam Houston, TX 78234, USA; Infectious Disease Clinical Research Program, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| | - John E Johnson
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Bruce E Torbett
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA 92037, USA.
| |
Collapse
|
117
|
Abstract
Until recently, members of the monogeneric family Arenaviridae (arenaviruses) have been known to infect only muroid rodents and, in one case, possibly phyllostomid bats. The paradigm of arenaviruses exclusively infecting small mammals shifted dramatically when several groups independently published the detection and isolation of a divergent group of arenaviruses in captive alethinophidian snakes. Preliminary phylogenetic analyses suggest that these reptilian arenaviruses constitute a sister clade to mammalian arenaviruses. Here, the members of the International Committee on Taxonomy of Viruses (ICTV) Arenaviridae Study Group, together with other experts, outline the taxonomic reorganization of the family Arenaviridae to accommodate reptilian arenaviruses and other recently discovered mammalian arenaviruses and to improve compliance with the Rules of the International Code of Virus Classification and Nomenclature (ICVCN). PAirwise Sequence Comparison (PASC) of arenavirus genomes and NP amino acid pairwise distances support the modification of the present classification. As a result, the current genus Arenavirus is replaced by two genera, Mammarenavirus and Reptarenavirus, which are established to accommodate mammalian and reptilian arenaviruses, respectively, in the same family. The current species landscape among mammalian arenaviruses is upheld, with two new species added for Lunk and Merino Walk viruses and minor corrections to the spelling of some names. The published snake arenaviruses are distributed among three new separate reptarenavirus species. Finally, a non-Latinized binomial species name scheme is adopted for all arenavirus species. In addition, the current virus abbreviations have been evaluated, and some changes are introduced to unequivocally identify each virus in electronic databases, manuscripts, and oral proceedings.
Collapse
|
118
|
Nagai M, Omatsu T, Aoki H, Kaku Y, Belsham GJ, Haga K, Naoi Y, Sano K, Umetsu M, Shiokawa M, Tsuchiaka S, Furuya T, Okazaki S, Katayama Y, Oba M, Shirai J, Katayama K, Mizutani T. Identification and complete genome analysis of a novel bovine picornavirus in Japan. Virus Res 2015; 210:205-12. [PMID: 26260333 PMCID: PMC7114519 DOI: 10.1016/j.virusres.2015.08.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Revised: 07/24/2015] [Accepted: 08/05/2015] [Indexed: 01/04/2023]
Abstract
We identified novel viruses in feces from cattle with diarrhea collected in 2009 in Hokkaido Prefecture, Japan, by using a metagenomics approach and determined the (near) complete sequences of the virus. Sequence analyses revealed that they had a standard picornavirus genome organization, i.e. 5' untranslated region (UTR) - L- P1 (VP4- VP3- VP2- VP1) - P2 (2A- 2B- 2C) - P3 (3A- 3B- 3C-3D) - 3'UTR- poly(A). They are closely related to other unclassified Chinese picornaviruses; bat picornaviruses group 1-3, feline picornavirus, and canine picornavirus, sharing 45.4-51.4% (P1), 38.0-44.9% (P2), and 49.6-53.3% (P3) amino acid identities, respectively. The phylogenetic analyses and detailed genome characterization showed that they, together with the unclassified Chinese picornaviruses, grouped as a cluster for the P1, 2C, 3CD and VP1 coding regions. These viruses had conserved features (e.g. predicted protein cleavage sites, presence of a leader protein, 2A, 2C, 3C, and 3D functional domains), suggesting they have a common ancestor. Reverse-transcription-PCR assays, using specific primers designed from the 5'UTR sequence of these viruses, showed that 23.0% (20/87) of fecal samples from cattle with diarrhea were positive, indicating the prevalence of these picornavirus in the Japanese cattle population in Hokkaido Prefecture. However, further studies are needed to investigate the pathogenic potential and etiological role of these viruses in cattle.
Collapse
Affiliation(s)
- Makoto Nagai
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan; Department of Veterinary Medicine, Faculty of Agriculture, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan.
| | - Tsutomu Omatsu
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan
| | - Hiroshi Aoki
- Faculty of Veterinary Science, Nippon Veterinary and Life Science University, Musashino, Tokyo 180-8602, Japan
| | - Yoshihiro Kaku
- Veterinary Science, National Institute of Infectious Diseases, Shinjuku, Tokyo 162-8640, Japan
| | - Graham J Belsham
- National Veterinary Institute, Technical University of Denmark, Lindholm, DK-4771 Kalvehave, Denmark
| | - Kei Haga
- Department of Virology II, National Institute of Infectious Diseases, Musashimurayama, Tokyo 208-0011, Japan
| | - Yuki Naoi
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan
| | - Kaori Sano
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan
| | - Moeko Umetsu
- Faculty of Veterinary Science, Nippon Veterinary and Life Science University, Musashino, Tokyo 180-8602, Japan
| | - Mai Shiokawa
- Faculty of Veterinary Science, Nippon Veterinary and Life Science University, Musashino, Tokyo 180-8602, Japan
| | - Shinobu Tsuchiaka
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan
| | - Tetsuya Furuya
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan; Department of Veterinary Medicine, Faculty of Agriculture, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan
| | - Sachiko Okazaki
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan
| | - Yukie Katayama
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan
| | - Mami Oba
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan
| | - Junsuke Shirai
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan; Department of Veterinary Medicine, Faculty of Agriculture, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan
| | - Kazuhiko Katayama
- Department of Virology II, National Institute of Infectious Diseases, Musashimurayama, Tokyo 208-0011, Japan
| | - Tetsuya Mizutani
- Research and Education Center for Prevention of Global Infectious Disease of Animal, Tokyo University of Agriculture and Technology, Fuchu, Tokyo 183-8509, Japan
| |
Collapse
|
119
|
Kim Y, Aw TG, Teal TK, Rose JB. Metagenomic Investigation of Viral Communities in Ballast Water. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2015; 49:8396-407. [PMID: 26107908 DOI: 10.1021/acs.est.5b01633] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Ballast water is one of the most important vectors for the transport of non-native species to new aquatic environments. Due to the development of new ballast water quality standards for viruses, this study aimed to determine the taxonomic diversity and composition of viral communities (viromes) in ballast and harbor waters using metagenomics approaches. Ballast waters from different sources within the North America Great Lakes and paired harbor waters were collected around the Port of Duluth-Superior. Bioinformatics analysis of over 550 million sequences showed that a majority of the viral sequences could not be assigned to any taxa associated with reference sequences, indicating the lack of knowledge on viruses in ballast and harbor waters. However, the assigned viruses were dominated by double-stranded DNA phages, and sequences associated with potentially emerging viral pathogens of fish and shrimp were detected with low amino acid similarity in both ballast and harbor waters. Annotation-independent comparisons showed that viromes were distinct among the Great Lakes, and the Great Lakes viromes were closely related to viromes of other cold natural freshwater systems but distant from viromes of marine and human designed/managed freshwater systems. These results represent the most detailed characterization to date of viruses in ballast water, demonstrating their diversity and the potential significance of the ship-mediated spread of viruses.
Collapse
Affiliation(s)
- Yiseul Kim
- †Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tiong Gim Aw
- ‡Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tracy K Teal
- †Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Joan B Rose
- †Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan 48824, United States
- ‡Department of Fisheries and Wildlife, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
120
|
Schaerer V, Haubitz S, Kovari H, Ledergerber B, Ambrosioni J, Cavassini M, Stoeckle M, Schmid P, Decosterd L, Aouri M, Böni J, Günthard HF, Furrer H, Metzner KJ, Fehr J, Rauch A. Protease inhibitors to treat hepatitis C in the Swiss HIV Cohort Study: high efficacy but low treatment uptake. HIV Med 2015; 16:599-607. [PMID: 26135140 DOI: 10.1111/hiv.12269] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/13/2015] [Indexed: 12/16/2022]
Abstract
OBJECTIVES Direct-acting antiviral agents (DAAs) have become the standard of care for the treatment of chronic hepatitis C virus (HCV) infection. We aimed to assess treatment uptake and efficacy in routine clinical settings among HIV/HCV coinfected patients after the introduction of the first generation DAAs. METHODS Data on all Swiss HIV Cohort Study (SHCS) participants starting HCV protease inhibitor (PI) treatment between September 2011 and August 2013 were collected prospectively. The uptake and efficacy of HCV therapy were compared with those in the time period before the availability of PIs. RESULTS Upon approval of PI treatment in Switzerland in September 2011, 516 SHCS participants had chronic HCV genotype 1 infection. Of these, 57 (11%) started HCV treatment during the following 2 years with either telaprevir, faldaprevir or boceprevir. Twenty-seven (47%) patients were treatment-naïve, nine (16%) were patients with relapse and 21 (37%) were partial or null responders. Twenty-nine (57%) had advanced fibrosis and 15 (29%) had cirrhosis. End-of-treatment virological response was 84% in treatment-naïve patients, 88% in patients with relapse and 62% in previous nonresponders. Sustained virological response was 78%, 86% and 40% in treatment-naïve patients, patients with relapse and nonresponders, respectively. Treatment uptake was similar before (3.8 per 100 patient-years) and after (6.1 per 100 patient-years) the introduction of PIs, while treatment efficacy increased considerably after the introduction of PIs. CONCLUSIONS The introduction of PI-based HCV treatment in HIV/HCV-coinfected patients improved virological response rates, while treatment uptake remained low. Therefore, the introduction of PIs into the clinical routine was beneficial at the individual level, but had only a modest effect on the burden of HCV infection at the population level.
Collapse
Affiliation(s)
- V Schaerer
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital of Zurich, University of Zurich, Zurich, Switzerland
| | - S Haubitz
- Department of Infectious Diseases, Bern University Hospital and University of Bern, Bern, Switzerland
| | - H Kovari
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital of Zurich, University of Zurich, Zurich, Switzerland
| | - B Ledergerber
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital of Zurich, University of Zurich, Zurich, Switzerland
| | | | - M Cavassini
- University Hospital Lausanne, Lausanne, Switzerland
| | - M Stoeckle
- University Hospital Basel, Basel, Switzerland
| | - P Schmid
- Kantonsspital St. Gallen, St. Gallen, Switzerland
| | - L Decosterd
- University Hospital Lausanne, Lausanne, Switzerland
| | - M Aouri
- University Hospital Lausanne, Lausanne, Switzerland
| | - J Böni
- University of Zurich, Institute of Medical Virology, Zurich, Switzerland
| | - H F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital of Zurich, University of Zurich, Zurich, Switzerland
| | - H Furrer
- Department of Infectious Diseases, Bern University Hospital and University of Bern, Bern, Switzerland
| | - K J Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital of Zurich, University of Zurich, Zurich, Switzerland
| | - J Fehr
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital of Zurich, University of Zurich, Zurich, Switzerland
| | - A Rauch
- Department of Infectious Diseases, Bern University Hospital and University of Bern, Bern, Switzerland
| | | |
Collapse
|
121
|
Liu Y, Chiaromonte F, Ross H, Malhotra R, Elleder D, Poss M. Error correction and statistical analyses for intra-host comparisons of feline immunodeficiency virus diversity from high-throughput sequencing data. BMC Bioinformatics 2015; 16:202. [PMID: 26123018 PMCID: PMC4486422 DOI: 10.1186/s12859-015-0607-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 04/29/2015] [Indexed: 11/16/2022] Open
Abstract
Background Infection with feline immunodeficiency virus (FIV) causes an immunosuppressive disease whose consequences are less severe if cats are co-infected with an attenuated FIV strain (PLV). We use virus diversity measurements, which reflect replication ability and the virus response to various conditions, to test whether diversity of virulent FIV in lymphoid tissues is altered in the presence of PLV. Our data consisted of the 3′ half of the FIV genome from three tissues of animals infected with FIV alone, or with FIV and PLV, sequenced by 454 technology. Results Since rare variants dominate virus populations, we had to carefully distinguish sequence variation from errors due to experimental protocols and sequencing. We considered an exponential-normal convolution model used for background correction of microarray data, and modified it to formulate an error correction approach for minor allele frequencies derived from high-throughput sequencing. Similar to accounting for over-dispersion in counts, this accounts for error-inflated variability in frequencies – and quite effectively reproduces empirically observed distributions. After obtaining error-corrected minor allele frequencies, we applied ANalysis Of VAriance (ANOVA) based on a linear mixed model and found that conserved sites and transition frequencies in FIV genes differ among tissues of dual and single infected cats. Furthermore, analysis of minor allele frequencies at individual FIV genome sites revealed 242 sites significantly affected by infection status (dual vs. single) or infection status by tissue interaction. All together, our results demonstrated a decrease in FIV diversity in bone marrow in the presence of PLV. Importantly, these effects were weakened or undetectable when error correction was performed with other approaches (thresholding of minor allele frequencies; probabilistic clustering of reads). We also queried the data for cytidine deaminase activity on the viral genome, which causes an asymmetric increase in G to A substitutions, but found no evidence for this host defense strategy. Conclusions Our error correction approach for minor allele frequencies (more sensitive and computationally efficient than other algorithms) and our statistical treatment of variation (ANOVA) were critical for effective use of high-throughput sequencing data in understanding viral diversity. We found that co-infection with PLV shifts FIV diversity from bone marrow to lymph node and spleen. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0607-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yang Liu
- Department of Statistics, The Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA.
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA.
| | - Howard Ross
- Bioinformatics Institute, School of Biological Sciences, University of Auckland, Auckland, 1142, New Zealand.
| | - Raunaq Malhotra
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.
| | - Daniel Elleder
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA. .,Current address: Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Videnska 1083, Prague, 14000, Czech Republic.
| | - Mary Poss
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA. .,Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
122
|
Pulido-Tamayo S, Sánchez-Rodríguez A, Swings T, Van den Bergh B, Dubey A, Steenackers H, Michiels J, Fostier J, Marchal K. Frequency-based haplotype reconstruction from deep sequencing data of bacterial populations. Nucleic Acids Res 2015; 43:e105. [PMID: 25990729 PMCID: PMC4652744 DOI: 10.1093/nar/gkv478] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 04/29/2015] [Indexed: 11/23/2022] Open
Abstract
Clonal populations accumulate mutations over time, resulting in different haplotypes. Deep sequencing of such a population in principle provides information to reconstruct these haplotypes and the frequency at which the haplotypes occur. However, this reconstruction is technically not trivial, especially not in clonal systems with a relatively low mutation frequency. The low number of segregating sites in those systems adds ambiguity to the haplotype phasing and thus obviates the reconstruction of genome-wide haplotypes based on sequence overlap information. Therefore, we present EVORhA, a haplotype reconstruction method that complements phasing information in the non-empty read overlap with the frequency estimations of inferred local haplotypes. As was shown with simulated data, as soon as read lengths and/or mutation rates become restrictive for state-of-the-art methods, the use of this additional frequency information allows EVORhA to still reliably reconstruct genome-wide haplotypes. On real data, we show the applicability of the method in reconstructing the population composition of evolved bacterial populations and in decomposing mixed bacterial infections from clinical samples.
Collapse
Affiliation(s)
- Sergio Pulido-Tamayo
- Department of Information Technology, Ghent University, iMinds, 9050 Gent, Belgium Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Aminael Sánchez-Rodríguez
- Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium Departamento de Ciencias Naturales, Universidad Técnica Particular de Loja, San Cayetano Alto S/N, EC1101608 Loja, Ecuador
| | - Toon Swings
- Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | - Bram Van den Bergh
- Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | - Akanksha Dubey
- Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | - Hans Steenackers
- Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | - Jan Michiels
- Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
| | - Jan Fostier
- Department of Information Technology, Ghent University, iMinds, 9050 Gent, Belgium
| | - Kathleen Marchal
- Department of Information Technology, Ghent University, iMinds, 9050 Gent, Belgium Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics, KU Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| |
Collapse
|
123
|
Ho CKY, Welkers MRA, Thomas XV, Sullivan JC, Kieffer TL, Reesink HW, Rebers SPH, de Jong MD, Schinkel J, Molenkamp R. A comparison of 454 sequencing and clonal sequencing for the characterization of hepatitis C virus NS3 variants. J Virol Methods 2015; 219:28-37. [PMID: 25818622 DOI: 10.1016/j.jviromet.2015.03.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Revised: 03/17/2015] [Accepted: 03/18/2015] [Indexed: 01/09/2023]
Abstract
We compared 454 amplicon sequencing with clonal sequencing for the characterization of intra-host hepatitis C virus (HCV) NS3 variants. Clonal and 454 sequences were obtained from 12 patients enrolled in a clinical phase I study for telaprevir, an NS3-4a protease inhibitor. Thirty-nine datasets were used to compare the consensus sequence, average pairwise distance, normalized Shannon entropy, phylogenetic tree topology and the number and frequency of variants derived from both sequencing techniques. In general, a good concordance was observed between both techniques for the majority of datasets. Discordant results were observed for 5 out of 39 clonal and 454 datasets, which could be attributed to primer-related selective amplification used for clonal sequencing. Both 454 and clonal datasets consisted of a few major variants and a large number of low-frequency variants. Telaprevir resistance-associated variants were observed in low frequencies and were detected more often by 454. We conclude that performance of 454 and clonal sequencing is comparable for the characterization of intra-host virus populations. Not surprisingly, 454 is superior for the detection of low frequency resistance-associated variants. However, despite the greater coverage, 454 failed to detect some low frequency variants detected by clonal sequencing.
Collapse
Affiliation(s)
- Cynthia K Y Ho
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - Matthijs R A Welkers
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - Xiomara V Thomas
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - James C Sullivan
- Department of Infectious Diseases, Vertex Pharmaceuticals Incorporated, Cambridge, MA 02139, USA.
| | - Tara L Kieffer
- Department of Infectious Diseases, Vertex Pharmaceuticals Incorporated, Cambridge, MA 02139, USA.
| | - Henk W Reesink
- Department of Gastroenterology and Hepatology, Academic Medical Center, Amsterdam 1104 AZ, The Netherlands.
| | - Sjoerd P H Rebers
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - Menno D de Jong
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - Janke Schinkel
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - Richard Molenkamp
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| |
Collapse
|
124
|
Rossi LMG, Escobar-Gutierrez A, Rahal P. Advanced molecular surveillance of hepatitis C virus. Viruses 2015; 7:1153-88. [PMID: 25781918 PMCID: PMC4379565 DOI: 10.3390/v7031153] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 02/05/2015] [Accepted: 02/20/2015] [Indexed: 12/12/2022] Open
Abstract
Hepatitis C virus (HCV) infection is an important public health problem worldwide. HCV exploits complex molecular mechanisms, which result in a high degree of intrahost genetic heterogeneity. This high degree of variability represents a challenge for the accurate establishment of genetic relatedness between cases and complicates the identification of sources of infection. Tracking HCV infections is crucial for the elucidation of routes of transmission in a variety of settings. Therefore, implementation of HCV advanced molecular surveillance (AMS) is essential for disease control. Accounting for virulence is also important for HCV AMS and both viral and host factors contribute to the disease outcome. Therefore, HCV AMS requires the incorporation of host factors as an integral component of the algorithms used to monitor disease occurrence. Importantly, implementation of comprehensive global databases and data mining are also needed for the proper study of the mechanisms responsible for HCV transmission. Here, we review molecular aspects associated with HCV transmission, as well as the most recent technological advances used for virus and host characterization. Additionally, the cornerstone discoveries that have defined the pathway for viral characterization are presented and the importance of implementing advanced HCV molecular surveillance is highlighted.
Collapse
Affiliation(s)
- Livia Maria Gonçalves Rossi
- Department of Biology, Institute of Bioscience, Language and Exact Science, Sao Paulo State University, Sao Jose do Rio Preto, SP 15054-000, Brazil.
| | | | - Paula Rahal
- Department of Biology, Institute of Bioscience, Language and Exact Science, Sao Paulo State University, Sao Jose do Rio Preto, SP 15054-000, Brazil.
| |
Collapse
|
125
|
Ogishi M, Yotsuyanagi H, Tsutsumi T, Gatanaga H, Ode H, Sugiura W, Moriya K, Oka S, Kimura S, Koike K. Deconvoluting the composition of low-frequency hepatitis C viral quasispecies: comparison of genotypes and NS3 resistance-associated variants between HCV/HIV coinfected hemophiliacs and HCV monoinfected patients in Japan. PLoS One 2015; 10:e0119145. [PMID: 25748426 PMCID: PMC4351984 DOI: 10.1371/journal.pone.0119145] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 01/09/2015] [Indexed: 12/16/2022] Open
Abstract
Pre-existing low-frequency resistance-associated variants (RAVs) may jeopardize successful sustained virological responses (SVR) to HCV treatment with direct-acting antivirals (DAAs). However, the potential impact of low-frequency (∼0.1%) mutations, concatenated mutations (haplotypes), and their association with genotypes (Gts) on the treatment outcome has not yet been elucidated, most probably owing to the difficulty in detecting pre-existing minor haplotypes with sufficient length and accuracy. Herein, we characterize a methodological framework based on Illumina MiSeq next-generation sequencing (NGS) coupled with bioinformatics of quasispecies reconstruction (QSR) to realize highly accurate variant calling and genotype-haplotype detection. The core-to-NS3 protease coding sequences in 10 HCV monoinfected patients, 5 of whom had a history of blood transfusion, and 11 HCV/HIV coinfected patients with hemophilia, were studied. Simulation experiments showed that, for minor variants constituting more than 1%, our framework achieved a positive predictive value (PPV) of 100% and sensitivities of 91.7–100% for genotyping and 80.6% for RAV screening. Genotyping analysis indicated the prevalence of dominant Gt1a infection in coinfected patients (6/11 vs 0/10, p = 0.01). For clinical samples, minor genotype overlapping infection was prevalent in HCV/HIV coinfected hemophiliacs (10/11) and patients who experienced whole-blood transfusion (4/5) but none in patients without exposure to blood (0/5). As for RAV screening, the Q80K/R and S122K/R variants were particularly prevalent among minor RAVs observed, detected in 12/21 and 6/21 cases, respectively. Q80K was detected only in coinfected patients, whereas Q80R was predominantly detected in monoinfected patients (1/11 vs 7/10, p < 0.01). Multivariate interdependence analysis revealed the previously unrecognized prevalence of Gt1b-Q80K, in HCV/HIV coinfected hemophiliacs [Odds ratio = 13.4 (3.48–51.9), p < 0.01]. Our study revealed the distinct characteristics of viral quasispecies between the subgroups specified above and the feasibility of NGS and QSR-based genetic deconvolution of pre-existing minor Gts, RAVs, and their interrelationships.
Collapse
Affiliation(s)
- Masato Ogishi
- Department of Internal Medicine, Graduate School of Medicine, University of Tokyo, Bunkyo, Tokyo, Japan
| | - Hiroshi Yotsuyanagi
- Department of Internal Medicine, Graduate School of Medicine, University of Tokyo, Bunkyo, Tokyo, Japan
- * E-mail:
| | - Takeya Tsutsumi
- Department of Internal Medicine, Graduate School of Medicine, University of Tokyo, Bunkyo, Tokyo, Japan
| | - Hiroyuki Gatanaga
- AIDS Clinical Center, National Center for Global Health and Medicine, Shinjuku, Tokyo, Japan
| | - Hirotaka Ode
- Department of Infectious Diseases and Immunology, Clinical Research Center, Nagoya Medical Center, Nagoya, Japan
| | - Wataru Sugiura
- Department of Infectious Diseases and Immunology, Clinical Research Center, Nagoya Medical Center, Nagoya, Japan
| | - Kyoji Moriya
- Department of Internal Medicine, Graduate School of Medicine, University of Tokyo, Bunkyo, Tokyo, Japan
| | - Shinichi Oka
- AIDS Clinical Center, National Center for Global Health and Medicine, Shinjuku, Tokyo, Japan
| | - Satoshi Kimura
- Director, Tokyo Teishin Hospital, Tokyo, Japan; President, Tokyo Health Care University, Tokyo, Japan
| | - Kazuhiko Koike
- Department of Internal Medicine, Graduate School of Medicine, University of Tokyo, Bunkyo, Tokyo, Japan
| |
Collapse
|
126
|
Pérez-Losada M, Arenas M, Galán JC, Palero F, González-Candelas F. Recombination in viruses: mechanisms, methods of study, and evolutionary consequences. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2015; 30:296-307. [PMID: 25541518 PMCID: PMC7106159 DOI: 10.1016/j.meegid.2014.12.022] [Citation(s) in RCA: 230] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Revised: 12/15/2014] [Accepted: 12/17/2014] [Indexed: 02/08/2023]
Abstract
Recombination is a pervasive process generating diversity in most viruses. It joins variants that arise independently within the same molecule, creating new opportunities for viruses to overcome selective pressures and to adapt to new environments and hosts. Consequently, the analysis of viral recombination attracts the interest of clinicians, epidemiologists, molecular biologists and evolutionary biologists. In this review we present an overview of three major areas related to viral recombination: (i) the molecular mechanisms that underlie recombination in model viruses, including DNA-viruses (Herpesvirus) and RNA-viruses (Human Influenza Virus and Human Immunodeficiency Virus), (ii) the analytical procedures to detect recombination in viral sequences and to determine the recombination breakpoints, along with the conceptual and methodological tools currently used and a brief overview of the impact of new sequencing technologies on the detection of recombination, and (iii) the major areas in the evolutionary analysis of viral populations on which recombination has an impact. These include the evaluation of selective pressures acting on viral populations, the application of evolutionary reconstructions in the characterization of centralized genes for vaccine design, and the evaluation of linkage disequilibrium and population structure.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Portugal; Computational Biology Institute, George Washington University, Ashburn, VA 20147, USA
| | - Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa", Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - Juan Carlos Galán
- Servicio de Microbiología, Hospital Ramón y Cajal and Instituto Ramón y Cajal de Investigación Sanitaria (IRYCIS), Madrid, Spain; CIBER en Epidemiología y Salud Pública, Spain
| | - Ferran Palero
- CIBER en Epidemiología y Salud Pública, Spain; Unidad Mixta Infección y Salud Pública, FISABIO-Universitat de València, Valencia, Spain
| | - Fernando González-Candelas
- CIBER en Epidemiología y Salud Pública, Spain; Unidad Mixta Infección y Salud Pública, FISABIO-Universitat de València, Valencia, Spain.
| |
Collapse
|
127
|
Verbist B, Clement L, Reumers J, Thys K, Vapirev A, Talloen W, Wetzels Y, Meys J, Aerssens J, Bijnens L, Thas O. ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering. BMC Bioinformatics 2015; 16:59. [PMID: 25887734 PMCID: PMC4369097 DOI: 10.1186/s12859-015-0458-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 12/16/2014] [Indexed: 11/10/2022] Open
Abstract
Background Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Results Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. Conclusions ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0458-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bie Verbist
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000, Belgium.
| | - Lieven Clement
- Department of Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, Gent, 9000, Belgium.
| | - Joke Reumers
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Kim Thys
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Alexander Vapirev
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium. .,ExaScience Life Lab, Kapeldreef 75, Leuven, 3001, Belgium.
| | - Willem Talloen
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Yves Wetzels
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Joris Meys
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000, Belgium.
| | - Jeroen Aerssens
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Luc Bijnens
- Janssen R&D, Janssen Pharmaceutical Companies of J&J, Turnhoutseweg 30, Beerse, 2340, Belgium.
| | - Olivier Thas
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, Gent, 9000, Belgium. .,University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW, 2522, Australia.
| |
Collapse
|
128
|
Yang C, Zhao X, Sun D, Yang L, Chong C, Pan Y, Chi X, Gao Y, Wang M, Shi X, Sun H, Lv J, Gao Y, Zhong J, Niu J, Sun B. Interferon alpha (IFNα)-induced TRIM22 interrupts HCV replication by ubiquitinating NS5A. Cell Mol Immunol 2015; 13:94-102. [PMID: 25683609 DOI: 10.1038/cmi.2014.131] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2014] [Revised: 12/05/2014] [Accepted: 12/05/2014] [Indexed: 12/28/2022] Open
Abstract
TRIM22, a tripartite-motif (TRIM) protein, is upregulated upon interferon alpha (IFNα) administration to hepatitis C virus (HCV)-infected patients. However, the physiological role of TRIM22 upregulation remains unclear. Here, we describe a potential antiviral function of TRIM22's targeting of the HCV NS5A protein. NS5A is important for HCV replication and for resistance to IFNα therapy. During the first 24 h following the initiation of IFNα treatment, upregulation of TRIM22 in the peripheral blood mononuclear cells (PBMCs) of HCV patients correlated with a decrease in viral titer. This phenomenon was confirmed in the hepatocyte-derived cell line Huh-7, which is highly permissive for HCV infection. TRIM22 over-expression inhibited HCV replication, and Small interfering RNA (siRNA)-mediated knockdown of TRIM22 diminished IFNα-induced anti-HCV function. Furthermore, we determined that TRIM22 ubiquitinates NS5A in a concentration-dependent manner. In summary, our results suggest that TRIM22 upregulation is associated with HCV decline during IFNα treatment and plays an important role in controlling HCV replication in vitro.
Collapse
Affiliation(s)
- Chen Yang
- State Key Laboratory of Cell Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Xinhao Zhao
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Dakang Sun
- Experiment Center of Clinical Medicine, Affiliated Hospital of Binzhou Medical University, Binzhou, China
| | - Leilei Yang
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Chang Chong
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu Pan
- Hepatology Section, First Hospital, University of Jilin, Changchun, China
| | - Xiumei Chi
- Hepatology Section, First Hospital, University of Jilin, Changchun, China
| | - Yanhang Gao
- Hepatology Section, First Hospital, University of Jilin, Changchun, China
| | - Moli Wang
- Infectious Diseases Department, Fourth Hospital, University of Jilin, Changchun, China
| | - Xiaodong Shi
- Hepatology Section, First Hospital, University of Jilin, Changchun, China
| | - Haibo Sun
- Hepatology Section, First Hospital, University of Jilin, Changchun, China
| | - Juan Lv
- Hepatology Section, First Hospital, University of Jilin, Changchun, China
| | - Yuanda Gao
- Hepatology Section, First Hospital, University of Jilin, Changchun, China
| | - Jin Zhong
- Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Junqi Niu
- Hepatology Section, First Hospital, University of Jilin, Changchun, China
| | - Bing Sun
- State Key Laboratory of Cell Biology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
129
|
Yuan K, Sakoparnig T, Markowetz F, Beerenwinkel N. BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies. Genome Biol 2015; 16:36. [PMID: 25786108 PMCID: PMC4359483 DOI: 10.1186/s13059-015-0592-6] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 01/21/2015] [Indexed: 11/28/2022] Open
Abstract
Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogenyBitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. We validate our approach in the controlled setting of a simulation study and compare it against several competing methods. In two case studies, we demonstrate how BitPhylogeny BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm.
Collapse
Affiliation(s)
- Ke Yuan
- />University of Cambridge, Cancer Research UK Cambridge Institute, Cambridge, UK
| | - Thomas Sakoparnig
- />Department of Biosystems Science and Engineering, ETH Zurich, Basel Switzerland
- />SIB Swiss Institute of Bioinformatics, Basel, Switzerland
- />Current address: Biozentrum, University of Basel, Basel, Switzerland
| | - Florian Markowetz
- />University of Cambridge, Cancer Research UK Cambridge Institute, Cambridge, UK
| | - Niko Beerenwinkel
- />Department of Biosystems Science and Engineering, ETH Zurich, Basel Switzerland
- />SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
130
|
Welkers MRA, Jonges M, Jeeninga RE, Koopmans MPG, de Jong MD. Improved detection of artifactual viral minority variants in high-throughput sequencing data. Front Microbiol 2015; 5:804. [PMID: 25657642 PMCID: PMC4302989 DOI: 10.3389/fmicb.2014.00804] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 12/29/2014] [Indexed: 02/05/2023] Open
Abstract
High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina HiSeq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after "best practice" quality control (QC), within the plasmid pool, one minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to three clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs).
Collapse
Affiliation(s)
| | - Marcel Jonges
- Centre for Infectious Disease Control, National Institute for Public Health and the EnvironmentBilthoven, Netherlands
- Department of Viroscience, Erasmus Medical CenterRotterdam, Netherlands
| | - Rienk E. Jeeninga
- Department of Medical Microbiology, Academic Medical CentreAmsterdam, Netherlands
| | - Marion P. G. Koopmans
- Centre for Infectious Disease Control, National Institute for Public Health and the EnvironmentBilthoven, Netherlands
- Department of Viroscience, Erasmus Medical CenterRotterdam, Netherlands
| | - Menno D. de Jong
- Department of Medical Microbiology, Academic Medical CentreAmsterdam, Netherlands
| |
Collapse
|
131
|
Preciado MV, Valva P, Escobar-Gutierrez A, Rahal P, Ruiz-Tovar K, Yamasaki L, Vazquez-Chacon C, Martinez-Guarneros A, Carpio-Pedroza JC, Fonseca-Coronado S, Cruz-Rivera M. Hepatitis C virus molecular evolution: Transmission, disease progression and antiviral therapy. World J Gastroenterol 2014; 20:15992-16013. [PMID: 25473152 PMCID: PMC4239486 DOI: 10.3748/wjg.v20.i43.15992] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Revised: 06/22/2014] [Accepted: 08/28/2014] [Indexed: 02/06/2023] Open
Abstract
Hepatitis C virus (HCV) infection represents an important public health problem worldwide. Reduction of HCV morbidity and mortality is a current challenge owned to several viral and host factors. Virus molecular evolution plays an important role in HCV transmission, disease progression and therapy outcome. The high degree of genetic heterogeneity characteristic of HCV is a key element for the rapid adaptation of the intrahost viral population to different selection pressures (e.g., host immune responses and antiviral therapy). HCV molecular evolution is shaped by different mechanisms including a high mutation rate, genetic bottlenecks, genetic drift, recombination, temporal variations and compartmentalization. These evolutionary processes constantly rearrange the composition of the HCV intrahost population in a staging manner. Remarkable advances in the understanding of the molecular mechanism controlling HCV replication have facilitated the development of a plethora of direct-acting antiviral agents against HCV. As a result, superior sustained viral responses have been attained. The rapidly evolving field of anti-HCV therapy is expected to broad its landscape even further with newer, more potent antivirals, bringing us one step closer to the interferon-free era.
Collapse
|
132
|
Abstract
Fitness is a central quantity in evolutionary models of viruses. However, it remains difficult to determine viral fitness experimentally, and existing in vitro assays can be poor predictors of in vivo fitness of viral populations within their hosts. Next-generation sequencing can nowadays provide snapshots of evolving virus populations, and these data offer new opportunities for inferring viral fitness. Using the equilibrium distribution of the quasispecies model, an established model of intrahost viral evolution, we linked fitness parameters to the composition of the virus population, which can be estimated by next-generation sequencing. For inference, we developed a Bayesian Markov chain Monte Carlo method to sample from the posterior distribution of fitness values. The sampler can overcome situations where no maximum-likelihood estimator exists, and it can adaptively learn the posterior distribution of highly correlated fitness landscapes without prior knowledge of their shape. We tested our approach on simulated data and applied it to clinical human immunodeficiency virus 1 samples to estimate their fitness landscapes in vivo. The posterior fitness distributions allowed for differentiating viral haplotypes from each other, for determining neutral haplotype networks, in which no haplotype is more or less credibly fit than any other, and for detecting epistasis in fitness landscapes. Our implemented approach, called QuasiFit, is available at http://www.cbg.ethz.ch/software/quasifit.
Collapse
|
133
|
Wood GR, Burroughs NJ, Evans DJ, Ryabov EV. Error correction and diversity analysis of population mixtures determined by NGS. PeerJ 2014; 2:e645. [PMID: 25405074 PMCID: PMC4232844 DOI: 10.7717/peerj.645] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Accepted: 10/10/2014] [Indexed: 11/20/2022] Open
Abstract
The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees. The paper has two findings. First, a method for correction of next generation sequencing error in the distribution of nucleotides at a site is developed. Second, a package of methods for assessment of nucleotide diversity is assembled. The error correction method is statistically based and works at the level of the nucleotide distribution rather than the level of individual nucleotides. The method relies on an error model and a sample of known viral genotypes that is used for model calibration. A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated. The methods are illustrated using honeybee viral samples. Software in both Excel and Matlab and a guide are available at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/, the Warwick University Systems Biology Centre software download site.
Collapse
Affiliation(s)
- Graham R Wood
- Warwick Systems Biology Centre, University of Warwick , Coventry , United Kingdom
| | - Nigel J Burroughs
- Warwick Systems Biology Centre, University of Warwick , Coventry , United Kingdom
| | - David J Evans
- School of Life Sciences, University of Warwick , Coventry , United Kingdom
| | - Eugene V Ryabov
- School of Life Sciences, University of Warwick , Coventry , United Kingdom
| |
Collapse
|
134
|
Jayasundara D, Saeed I, Maheswararajah S, Chang B, Tang SL, Halgamuge SK. ViQuaS: an improved reconstruction pipeline for viral quasispecies spectra generated by next-generation sequencing. Bioinformatics 2014; 31:886-96. [DOI: 10.1093/bioinformatics/btu754] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
|
135
|
Ram D, Leshkowitz D, Gonzalez D, Forer R, Levy I, Chowers M, Lorber M, Hindiyeh M, Mendelson E, Mor O. Evaluation of GS Junior and MiSeq next-generation sequencing technologies as an alternative to Trugene population sequencing in the clinical HIV laboratory. J Virol Methods 2014; 212:12-6. [PMID: 25445792 DOI: 10.1016/j.jviromet.2014.11.003] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Revised: 11/02/2014] [Accepted: 11/04/2014] [Indexed: 01/20/2023]
Abstract
Population HIV-1 sequencing is currently the method of choice for the identification and follow-up of HIV-1 antiretroviral drug resistance. It has limited sensitivity and results in a consensus sequence showing the most prevalent nucleotide per position. Moreover concomitant sequencing and interpretation of the results for several samples together is laborious and time consuming. In this study, the practical use of GS Junior and MiSeq bench-top next generation sequencing (NGS) platforms as an alternative to Trugene Sanger-based population sequencing in the clinical HIV laboratory was assessed. DeepChek(®)-HIV TherapyEdge software was used for processing all the protease and reverse transcriptase sequences and for resistance interpretation. Plasma samples from nine HIV-1 carriers, representing the major HIV-1 subtypes in Israel, were compared. The total number of amino acid substitutions identified in the nine samples by GS Junior (232 substitutions) and MiSeq (243 substitutions) was similar and higher than Trugene (181 substitutions), emphasizing the advantage of deep sequencing on population sequencing. More than 80% of the identified substitutions were identical between the GS Junior and MiSeq platforms, most of which (184 of 199) at similar frequency. Low abundance substitutions accounted for 20.9% of the MiSeq and 21.9% of the GS Junior output, the majority of which were not detected by Trugene. More drug resistance mutations were identified by both the NGS platforms, primarily, but not only, at low abundance. In conclusion, in combination with DeepChek, both GS Junior and MiSeq were found to be more sensitive than Trugene and adequate for HIV-1 resistance analysis in the clinical HIV laboratory.
Collapse
Affiliation(s)
- Daniela Ram
- National HIV Reference Laboratory, Central Virology Laboratory, Ministry of Health, Tel-Hashomer, Ramat-Gan, Israel.
| | - Dena Leshkowitz
- Bioinformatics Unit, The Nancy and Stephen Grand National Center for Personalized Medicine, Weizmann Institute, Rehovot, Israel.
| | | | | | - Itzchak Levy
- Infectious Disease Unit, Sheba Medical Center, Tel-Hashomer, Ramat-Gan, Israel.
| | - Michal Chowers
- Infectious Disease Unit, Meir Medical Center, Kfar Saba, Israel.
| | - Margalit Lorber
- Autoimmune Disease Unit, Rambam Medical Center, Haifa, Israel.
| | - Musa Hindiyeh
- National HIV Reference Laboratory, Central Virology Laboratory, Ministry of Health, Tel-Hashomer, Ramat-Gan, Israel; Tel-Aviv University, Tel-Aviv, Israel.
| | - Ella Mendelson
- National HIV Reference Laboratory, Central Virology Laboratory, Ministry of Health, Tel-Hashomer, Ramat-Gan, Israel; Tel-Aviv University, Tel-Aviv, Israel.
| | - Orna Mor
- National HIV Reference Laboratory, Central Virology Laboratory, Ministry of Health, Tel-Hashomer, Ramat-Gan, Israel.
| |
Collapse
|
136
|
Rizzi R, Tomescu AI, Mäkinen V. On the complexity of Minimum Path Cover with Subpath Constraints for multi-assembly. BMC Bioinformatics 2014; 15 Suppl 9:S5. [PMID: 25252805 PMCID: PMC4168716 DOI: 10.1186/1471-2105-15-s9-s5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Multi-assembly problems have gathered much attention in the last years, as Next-Generation Sequencing technologies have started being applied to mixed settings, such as reads from the transcriptome (RNA-Seq), or from viral quasi-species. One classical model that has resurfaced in many multi-assembly methods (e.g. in Cufflinks, ShoRAH, BRANCH, CLASS) is the Minimum Path Cover (MPC) Problem, which asks for the minimum number of directed paths that cover all the nodes of a directed acyclic graph. The MPC Problem is highly popular because the acyclicity of the graph ensures its polynomial-time solvability. RESULTS In this paper, we consider two generalizations of it dealing with integrating constraints arising from long reads or paired-end reads; these extensions have also been considered by two recent methods, but not fully solved. More specifically, we study the two problems where also a set of subpaths, or pairs of subpaths, of the graph have to be entirely covered by some path in the MPC. We show that in the case of long reads (subpaths), the generalized problem can be solved in polynomial-time by a reduction to the classical MPC Problem. We also consider the weighted case, and show that it can be solved in polynomial-time by a reduction to a min-cost circulation problem. As a side result, we also improve the time complexity of the classical minimum weight MPC Problem. In the case of paired-end reads (pairs of subpaths), the generalized problem becomes NP-hard, but we show that it is fixed-parameter tractable (FPT) in the total number of constraints. This computational dichotomy between long reads and paired-end reads is also a general insight into multi-assembly problems.
Collapse
Affiliation(s)
- Romeo Rizzi
- Department of Computer Science, University of Verona, Italy
| | - Alexandru I Tomescu
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Veli Mäkinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
137
|
Verbist BMP, Thys K, Reumers J, Wetzels Y, Van der Borght K, Talloen W, Aerssens J, Clement L, Thas O. VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering. ACTA ACUST UNITED AC 2014; 31:94-101. [PMID: 25178459 DOI: 10.1093/bioinformatics/btu587] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
MOTIVATION In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. RESULTS A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. AVAILABILITY The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory.
Collapse
Affiliation(s)
- Bie M P Verbist
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Kim Thys
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Joke Reumers
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Yves Wetzels
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Koen Van der Borght
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Willem Talloen
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Jeroen Aerssens
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Lieven Clement
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| | - Olivier Thas
- Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia Department of Mathematical Modeling, Statistics and Bioinformatics, Ghent University, Coupure Links 653, 9000 Gent, Janssen R&D, Janssen Pharmaceutical Companies of Johnson & Johnson, Turnhoutseweg 30, 2340 Beerse, Applied Mathematics, Informatics and Statistics, Ghent University, Krijgslaan 281 S9, 9000 Gent, Belgium and University of Wollongong, National Institute for Applied Statistics Research Australia (NIASRA), School of Mathematics and Applied Statistics, NSW 2522, Australia
| |
Collapse
|
138
|
Sede MM, Moretti FA, Laufer NL, Jones LR, Quarleri JF. HIV-1 tropism dynamics and phylogenetic analysis from longitudinal ultra-deep sequencing data of CCR5- and CXCR4-using variants. PLoS One 2014; 9:e102857. [PMID: 25032817 PMCID: PMC4102574 DOI: 10.1371/journal.pone.0102857] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 06/25/2014] [Indexed: 11/25/2022] Open
Abstract
OBJECTIVE Coreceptor switch from CCR5 to CXCR4 is associated with HIV disease progression. The molecular and evolutionary mechanisms underlying the CCR5 to CXCR4 switch are the focus of intense recent research. We studied the HIV-1 tropism dynamics in relation to coreceptor usage, the nature of quasispecies from ultra deep sequencing (UDPS) data and their phylogenetic relationships. METHODS Here, we characterized C2-V3-C3 sequences of HIV obtained from 19 patients followed up for 54 to 114 months using UDPS, with further genotyping and phylogenetic analysis for coreceptor usage. HIV quasispecies diversity and variability as well as HIV plasma viral load were measured longitudinally and their relationship with the HIV coreceptor usage was analyzed. The longitudinal UDPS data were submitted to phylogenetic analysis and sampling times and coreceptor usage were mapped onto the trees obtained. RESULTS Although a temporal viral genetic structuring was evident, the persistence of several viral lineages evolving independently along the infection was statistically supported, indicating a complex scenario for the evolution of viral quasispecies. HIV X4-using variants were present in most of our patients, exhibiting a dissimilar inter- and intra-patient predominance as the component of quasispecies even on antiretroviral therapy. The viral populations from some of the patients studied displayed evidences of the evolution of X4 variants through fitness valleys, whereas for other patients the data favored a gradual mode of emergence. CONCLUSIONS CXCR4 usage can emerge independently, in multiple lineages, along the course of HIV infection. The mode of emergence, i.e. gradual or through fitness valleys seems to depend on both virus and patient factors. Furthermore, our analyses suggest that, besides becoming dominant after population-level switches, minor proportions of X4 viruses might exist along the infection, perhaps even at early stages of it. The fate of these minor variants might depend on both viral and host factors.
Collapse
Affiliation(s)
- Mariano M. Sede
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Franco A. Moretti
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Natalia L. Laufer
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Leandro R. Jones
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales, sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, Chubut, Argentina
| | - Jorge F. Quarleri
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
139
|
Deakin CT, Deakin JJ, Ginn SL, Young P, Humphreys D, Suter CM, Alexander IE, Hallwirth CV. Impact of next-generation sequencing error on analysis of barcoded plasmid libraries of known complexity and sequence. Nucleic Acids Res 2014; 42:e129. [PMID: 25013183 PMCID: PMC4176369 DOI: 10.1093/nar/gku607] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Barcoded vectors are promising tools for investigating clonal diversity and dynamics in hematopoietic gene therapy. Analysis of clones marked with barcoded vectors requires accurate identification of potentially large numbers of individually rare barcodes, when the exact number, sequence identity and abundance are unknown. This is an inherently challenging application, and the feasibility of using contemporary next-generation sequencing technologies is unresolved. To explore this potential application empirically, without prior assumptions, we sequenced barcode libraries of known complexity. Libraries containing 1, 10 and 100 Sanger-sequenced barcodes were sequenced using an Illumina platform, with a 100-barcode library also sequenced using a SOLiD platform. Libraries containing 1 and 10 barcodes were distinguished from false barcodes generated by sequencing error by a several log-fold difference in abundance. In 100-barcode libraries, however, expected and false barcodes overlapped and could not be resolved by bioinformatic filtering and clustering strategies. In independent sequencing runs multiple false-positive barcodes appeared to be represented at higher abundance than known barcodes, despite their confirmed absence from the original library. Such errors, which potentially impact barcoding studies in an application-dependent manner, are consistent with the existence of both stochastic and systematic error, the mechanism of which is yet to be fully resolved.
Collapse
Affiliation(s)
- Claire T Deakin
- Gene Therapy Research Unit, Children's Medical Research Institute and The Children's Hospital at Westmead, Westmead, New South Wales 2145, Australia
| | - Jeffrey J Deakin
- Gene Therapy Research Unit, Children's Medical Research Institute and The Children's Hospital at Westmead, Westmead, New South Wales 2145, Australia
| | - Samantha L Ginn
- Gene Therapy Research Unit, Children's Medical Research Institute and The Children's Hospital at Westmead, Westmead, New South Wales 2145, Australia
| | - Paul Young
- Molecular Genetics Division, Victor Chang Cardiac Research Institute, Sydney, Darlinghurst, New South Wales 2010, Australia
| | - David Humphreys
- Molecular Genetics Division, Victor Chang Cardiac Research Institute, Sydney, Darlinghurst, New South Wales 2010, Australia
| | - Catherine M Suter
- Molecular Genetics Division, Victor Chang Cardiac Research Institute, Sydney, Darlinghurst, New South Wales 2010, Australia Faculty of Medicine, University of New South Wales, Kensington, New South Wales 2052, Australia
| | - Ian E Alexander
- Gene Therapy Research Unit, Children's Medical Research Institute and The Children's Hospital at Westmead, Westmead, New South Wales 2145, Australia Discipline of Paediatrics and Child Health, The Children's Hospital at Westmead Clinical School, The University of Sydney, Westmead, New South Wales 2145, Australia
| | - Claus V Hallwirth
- Gene Therapy Research Unit, Children's Medical Research Institute and The Children's Hospital at Westmead, Westmead, New South Wales 2145, Australia
| |
Collapse
|
140
|
Giallonardo FD, Töpfer A, Rey M, Prabhakaran S, Duport Y, Leemann C, Schmutz S, Campbell NK, Joos B, Lecca MR, Patrignani A, Däumer M, Beisel C, Rusert P, Trkola A, Günthard HF, Roth V, Beerenwinkel N, Metzner KJ. Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations. Nucleic Acids Res 2014; 42:e115. [PMID: 24972832 PMCID: PMC4132706 DOI: 10.1093/nar/gku537] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Next-generation sequencing (NGS) technologies enable new insights into the diversity of virus populations within their hosts. Diversity estimation is currently restricted to single-nucleotide variants or to local fragments of no more than a few hundred nucleotides defined by the length of sequence reads. To study complex heterogeneous virus populations comprehensively, novel methods are required that allow for complete reconstruction of the individual viral haplotypes. Here, we show that assembly of whole viral genomes of ∼8600 nucleotides length is feasible from mixtures of heterogeneous HIV-1 strains derived from defined combinations of cloned virus strains and from clinical samples of an HIV-1 superinfected individual. Haplotype reconstruction was achieved using optimized experimental protocols and computational methods for amplification, sequencing and assembly. We comparatively assessed the performance of the three NGS platforms 454 Life Sciences/Roche, Illumina and Pacific Biosciences for this task. Our results prove and delineate the feasibility of NGS-based full-length viral haplotype reconstruction and provide new tools for studying evolution and pathogenesis of viruses.
Collapse
Affiliation(s)
- Francesca Di Giallonardo
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland Life Science Zurich Graduate School, University of Zurich, 8057 Zurich, Switzerland
| | - Armin Töpfer
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Melanie Rey
- Department of Mathematics and Computer Science, University of Basel, 4056 Basel, Switzerland
| | - Sandhya Prabhakaran
- Department of Mathematics and Computer Science, University of Basel, 4056 Basel, Switzerland
| | - Yannick Duport
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Christine Leemann
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Stefan Schmutz
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Nottania K Campbell
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland Life Science Zurich Graduate School, University of Zurich, 8057 Zurich, Switzerland
| | - Beda Joos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Maria Rita Lecca
- Functional Genomics Center Zurich, University of Zurich, ETH Zurich, 8057 Zurich, Switzerland
| | - Andrea Patrignani
- Functional Genomics Center Zurich, University of Zurich, ETH Zurich, 8057 Zurich, Switzerland
| | - Martin Däumer
- Institut für Immunologie und Genetik, 67655 Kaiserslautern, Germany
| | - Christian Beisel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Peter Rusert
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Alexandra Trkola
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Volker Roth
- Department of Mathematics and Computer Science, University of Basel, 4056 Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Karin J Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| |
Collapse
|
141
|
Quiñones-Mateu ME, Avila S, Reyes-Teran G, Martinez MA. Deep sequencing: becoming a critical tool in clinical virology. J Clin Virol 2014; 61:9-19. [PMID: 24998424 DOI: 10.1016/j.jcv.2014.06.013] [Citation(s) in RCA: 90] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 06/12/2014] [Accepted: 06/14/2014] [Indexed: 02/07/2023]
Abstract
Population (Sanger) sequencing has been the standard method in basic and clinical DNA sequencing for almost 40 years; however, next-generation (deep) sequencing methodologies are now revolutionizing the field of genomics, and clinical virology is no exception. Deep sequencing is highly efficient, producing an enormous amount of information at low cost in a relatively short period of time. High-throughput sequencing techniques have enabled significant contributions to multiples areas in virology, including virus discovery and metagenomics (viromes), molecular epidemiology, pathogenesis, and studies of how viruses to escape the host immune system and antiviral pressures. In addition, new and more affordable deep sequencing-based assays are now being implemented in clinical laboratories. Here, we review the use of the current deep sequencing platforms in virology, focusing on three of the most studied viruses: human immunodeficiency virus (HIV), hepatitis C virus (HCV), and influenza virus.
Collapse
Affiliation(s)
- Miguel E Quiñones-Mateu
- University Hospital Translational Laboratory, University Hospitals Case Medical Center, Cleveland, OH, USA; Department of Pathology, Case Western Reserve University, Cleveland, OH, USA
| | - Santiago Avila
- Instituto Nacional de Enfermedades Respiratorias, Mexico City, Mexico; Centro de Investigaciones en Enfermedades Infecciosas, Mexico City, Mexico
| | - Gustavo Reyes-Teran
- Instituto Nacional de Enfermedades Respiratorias, Mexico City, Mexico; Centro de Investigaciones en Enfermedades Infecciosas, Mexico City, Mexico
| | - Miguel A Martinez
- Fundació irsicaixa, Universitat Autònoma de Barcelona, Hospital Universitari Germans Trias i Pujol, Badalona, Spain
| |
Collapse
|
142
|
A bioinformatics pipeline for the analyses of viral escape dynamics and host immune responses during an infection. BIOMED RESEARCH INTERNATIONAL 2014; 2014:264519. [PMID: 25013771 PMCID: PMC4072169 DOI: 10.1155/2014/264519] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 05/08/2014] [Indexed: 01/21/2023]
Abstract
Rapidly mutating viruses, such as hepatitis C virus (HCV) and HIV, have adopted evolutionary strategies that allow escape from the host immune response via genomic mutations. Recent advances in high-throughput sequencing are reshaping the field of immuno-virology of viral infections, as these allow fast and cheap generation of genomic data. However, due to the large volumes of data generated, a thorough understanding of the biological and immunological significance of such information is often difficult. This paper proposes a pipeline that allows visualization and statistical analysis of viral mutations that are associated with immune escape. Taking next generation sequencing data from longitudinal analysis of HCV viral genomes during a single HCV infection, along with antigen specific T-cell responses detected from the same subject, we demonstrate the applicability of these tools in the context of primary HCV infection. We provide a statistical and visual explanation of the relationship between cooccurring mutations on the viral genome and the parallel adaptive immune response against HCV.
Collapse
|
143
|
HIV-1 quasispecies delineation by tag linkage deep sequencing. PLoS One 2014; 9:e97505. [PMID: 24842159 PMCID: PMC4026136 DOI: 10.1371/journal.pone.0097505] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Accepted: 04/17/2014] [Indexed: 12/16/2022] Open
Abstract
Trade-offs between throughput, read length, and error rates in high-throughput sequencing limit certain applications such as monitoring viral quasispecies. Here, we describe a molecular-based tag linkage method that allows assemblage of short sequence reads into long DNA fragments. It enables haplotype phasing with high accuracy and sensitivity to interrogate individual viral sequences in a quasispecies. This approach is demonstrated to deduce ∼2000 unique 1.3 kb viral sequences from HIV-1 quasispecies in vivo and after passaging ex vivo with a detection limit of ∼0.005% to ∼0.001%. Reproducibility of the method is validated quantitatively and qualitatively by a technical replicate. This approach can improve monitoring of the genetic architecture and evolution dynamics in any quasispecies population.
Collapse
|
144
|
Gregori J, Salicrú M, Domingo E, Sanchez A, Esteban JI, Rodríguez-Frías F, Quer J. Inference with viral quasispecies diversity indices: clonal and NGS approaches. Bioinformatics 2014; 30:1104-1111. [PMID: 24389655 DOI: 10.1093/bioinformatics/btt768] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 12/25/2013] [Indexed: 02/07/2023] Open
Abstract
UNLABELLED Given the inherent dynamics of a viral quasispecies, we are often interested in the comparison of diversity indices of sequential samples of a patient, or in the comparison of diversity indices of virus in groups of patients in a treated versus control design. It is then important to make sure that the diversity measures from each sample may be compared with no bias and within a consistent statistical framework. In the present report, we review some indices often used as measures for viral quasispecies complexity and provide means for statistical inference, applying procedures taken from the ecology field. In particular, we examine the Shannon entropy and the mutation frequency, and we discuss the appropriateness of different normalization methods of the Shannon entropy found in the literature. By taking amplicons ultra-deep pyrosequencing (UDPS) raw data as a surrogate of a real hepatitis C virus viral population, we study through in-silico sampling the statistical properties of these indices under two methods of viral quasispecies sampling, classical cloning followed by Sanger sequencing (CCSS) and next-generation sequencing (NGS) such as UDPS. We propose solutions specific to each of the two sampling methods-CCSS and NGS-to guarantee statistically conforming conclusions as free of bias as possible. CONTACT josep.gregori@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Josep Gregori
- Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain
| | - Miquel Salicrú
- Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain
| | - Esteban Domingo
- Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain
| | - Alex Sanchez
- Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain
| | - Juan I Esteban
- Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain
| | - Francisco Rodríguez-Frías
- Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain
| | - Josep Quer
- Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain Liver Unit, Internal Medicine Lab Malalties Hepàtiques, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035 Barcelona, Spain, Roche Diagnostics SL, 08174, Sant Cugat del Vallès, Spain, Statistics Department, Biology Faculty, Barcelona University, 08028, Barcelona, Spain, CIBER de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Campus de Cantoblanco, 28049, Madrid, Spain, Bioinformatics and Statistics Unit, Vall d'Hebron Institut Recerca (VHIR-HUVH), 08035, Barcelona, Spain, Universitat Autònoma de Barcelona, 08193 Bellaterra, Barcelona, Spain and Biochemistry Unit. Virology Unit/Microbiology Department, HUVH, 08035 Barcelona, Spain
| |
Collapse
|
145
|
Simen BB, Braverman MS, Abbate I, Aerssens J, Bidet Y, Bouchez O, Gabriel C, Izopet J, Kessler HH, Stelzl E, Di Giallonardo F, Schlapbach R, Radonic A, Paredes R, Recordon-Pinson P, Sakwa J, St John EP, Schmitz-Agheguian GG, Metzner KJ, Däumer MP. An international multicenter study on HIV-1 drug resistance testing by 454 ultra-deep pyrosequencing. J Virol Methods 2014; 204:31-7. [PMID: 24731928 DOI: 10.1016/j.jviromet.2014.04.007] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Revised: 03/31/2014] [Accepted: 04/04/2014] [Indexed: 10/25/2022]
Abstract
The detection of mutant spectra within the viral quasispecies is critical for therapeutic management of HIV-1 infections. Routine clinical application of ultrasensitive genotyping requires reproducibility and concordance within and between laboratories. The goal of the study was to evaluate a new protocol on HIV-1 drug resistance testing by 454 ultra-deep pyrosequencing (454-UDS) in an international multicenter study. Sixteen blinded HIV-1 subtype B samples were provided for 454-UDS as both RNA and cDNA with viral titers of 88,600-573,000 HIV-1 RNA copies/ml. Eight overlapping amplicons spanning protease (PR) codons 10-99 and reverse transcriptase (RT) codons 1-251 were generated using molecular barcoded primers. 454-UDS was performed using the 454 Life Sciences/Roche GS FLX platform. PR and RT sequences were analyzed using 454 Life Sciences Amplicon Variant Analyzer (AVA) software. Quantified variation data were analyzed for intra-laboratory reproducibility and inter-laboratory concordance. Routine population sequencing was performed using the ViroSeq HIV-1 genotyping system. Eleven laboratories and the reference laboratory 454 Life Sciences sequenced the HIV-1 sample set. Data presented are derived from seven laboratories and the reference laboratory since severe study protocol execution errors occurred in four laboratories leading to exclusion. The median sequencing depth across all sites was 1364 reads per position (IQR=809-2065). 100% of the ViroSeq-reported mutations were also detected by 454-UDS. Minority HIV-1 drug resistance mutations, defined as HIV-1 drug resistance mutations identified at frequencies of 1-25%, were only detected by 454-UDS. Analysis of 10 preselected majority and minority mutations were consistently found across sites. The analysis of drug-resistance mutations detected between 1 and 10% demonstrated high intra- and inter-laboratory consistency in frequency estimates for both RNA and prepared cDNA samples, indicating robustness of the method. HIV-1 drug resistance testing using 454 ultra-deep pyrosequencing results in an accurate and highly reproducible, albeit complex, approach to the analysis of HIV-1 mutant spectra, even at frequencies well below those detected by routine population sequencing.
Collapse
Affiliation(s)
| | | | - Isabella Abbate
- National Institute for Infectious Diseases "L. Spallanzani, Rome, Italy
| | - Jeroen Aerssens
- Janssen Infectious Diseases - Diagnostics bvba, Beerse, Belgium
| | - Yannick Bidet
- Centre Jean Perrin/Clermont University, Clermont-Ferrand, France
| | | | | | - Jacques Izopet
- INSERM U1043 and Virology Laboratory, CHU Toulouse, Toulouse, France
| | | | | | - Francesca Di Giallonardo
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Ralph Schlapbach
- Functional Genomics Center Zurich, University of Zurich, ETH Zurich, Zurich, Switzerland
| | | | | | | | - James Sakwa
- TIA-National Genomics Platform, Durban, South Africa
| | | | | | - Karin J Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.
| | - Martin P Däumer
- Institute of Immunology and Genetics, Kaiserslautern, Germany
| | | |
Collapse
|
146
|
Massart S, Olmos A, Jijakli H, Candresse T. Current impact and future directions of high throughput sequencing in plant virus diagnostics. Virus Res 2014; 188:90-6. [PMID: 24717426 DOI: 10.1016/j.virusres.2014.03.029] [Citation(s) in RCA: 128] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Revised: 03/27/2014] [Accepted: 03/28/2014] [Indexed: 12/17/2022]
Abstract
The ability to provide a fast, inexpensive and reliable diagnostic for any given viral infection is a key parameter in efforts to fight and control these ubiquitous pathogens. The recent developments of high-throughput sequencing (also called Next Generation Sequencing - NGS) technologies and bioinformatics have drastically changed the research on viral pathogens. It is now raising a growing interest for virus diagnostics. This review provides a snapshot vision on the current use and impact of high throughput sequencing approaches in plant virus characterization. More specifically, this review highlights the potential of these new technologies and their interplay with current protocols in the future of molecular diagnostic of plant viruses. The current limitations that will need to be addressed for a wider adoption of high-throughput sequencing in plant virus diagnostics are thoroughly discussed.
Collapse
Affiliation(s)
- Sebastien Massart
- Laboratory of Phytopathology, University of Liège, Gembloux Agro-BioTech, Passage des déportés, 2, 5030 Gembloux, Belgium.
| | - Antonio Olmos
- Centro de Protección Vegetal, Instituto Valenciano de Investigaciones Agrarias (IVIA), Apartado Oficial, 46113 Moncada, Valencia, Spain
| | - Haissam Jijakli
- Laboratory of Phytopathology, University of Liège, Gembloux Agro-BioTech, Passage des déportés, 2, 5030 Gembloux, Belgium
| | - Thierry Candresse
- UMR 1332 de Biologie du fruit et Pathologie, INRA, CS20032, 33882 Villenave d'Ornon cedex, France; UMR 1332 de Biologie du fruit et Pathologie, Université de Bordeaux, CS20032, 33882 Villenave d'Ornon cedex, France
| |
Collapse
|
147
|
Strain-specific parallel evolution drives short-term diversification during Pseudomonas aeruginosa biofilm formation. Proc Natl Acad Sci U S A 2014; 111:E1419-27. [PMID: 24706926 DOI: 10.1073/pnas.1314340111] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Generation of genetic diversity is a prerequisite for bacterial evolution and adaptation. Short-term diversification and selection within populations is, however, largely uncharacterised, as existing studies typically focus on fixed substitutions. Here, we use whole-genome deep-sequencing to capture the spectrum of mutations arising during biofilm development for two Pseudomonas aeruginosa strains. This approach identified single nucleotide variants with frequencies from 0.5% to 98.0% and showed that the clinical strain 18A exhibits greater genetic diversification than the type strain PA01, despite its lower per base mutation rate. Mutations were found to be strain specific: the mucoid strain 18A experienced mutations in alginate production genes and a c-di-GMP regulator gene; while PA01 acquired mutations in PilT and PilY1, possibly in response to a rapid expansion of a lytic Pf4 bacteriophage, which may use type IV pili for infection. The Pf4 population diversified with an evolutionary rate of 2.43 × 10(-3) substitutions per site per day, which is comparable to single-stranded RNA viruses. Extensive within-strain parallel evolution, often involving identical nucleotides, was also observed indicating that mutation supply is not limiting, which was contrasted by an almost complete lack of noncoding and synonymous mutations. Taken together, these results suggest that the majority of the P. aeruginosa genome is constrained by negative selection, with strong positive selection acting on an accessory subset of genes that facilitate adaptation to the biofilm lifecycle. Long-term bacterial evolution is known to proceed via few, nonsynonymous, positively selected mutations, and here we show that similar dynamics govern short-term, within-population bacterial diversification.
Collapse
|
148
|
Kao RR, Haydon DT, Lycett SJ, Murcia PR. Supersize me: how whole-genome sequencing and big data are transforming epidemiology. Trends Microbiol 2014; 22:282-91. [PMID: 24661923 PMCID: PMC7125769 DOI: 10.1016/j.tim.2014.02.011] [Citation(s) in RCA: 90] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2013] [Revised: 02/17/2014] [Accepted: 02/24/2014] [Indexed: 01/08/2023]
Abstract
Whole-genome sequencing is used for forensic epidemiology. Big data can transform forensic epidemiology. Clustering, biases, wildlife reservoirs, and emerging infections can all be addressed. Phylodynamics approaches to integrate epidemiological and evolutionary data have been highly successful but still face scientific challenges.
In epidemiology, the identification of ‘who infected whom’ allows us to quantify key characteristics such as incubation periods, heterogeneity in transmission rates, duration of infectiousness, and the existence of high-risk groups. Although invaluable, the existence of many plausible infection pathways makes this difficult, and epidemiological contact tracing either uncertain, logistically prohibitive, or both. The recent advent of next-generation sequencing technology allows the identification of traceable differences in the pathogen genome that are transforming our ability to understand high-resolution disease transmission, sometimes even down to the host-to-host scale. We review recent examples of the use of pathogen whole-genome sequencing for the purpose of forensic tracing of transmission pathways, focusing on the particular problems where evolutionary dynamics must be supplemented by epidemiological information on the most likely timing of events as well as possible transmission pathways. We also discuss potential pitfalls in the over-interpretation of these data, and highlight the manner in which a confluence of this technology with sophisticated mathematical and statistical approaches has the potential to produce a paradigm shift in our understanding of infectious disease transmission and control.
Collapse
Affiliation(s)
- Rowland R Kao
- Boyd Orr Centre for Population and Ecosystem Health, College of Medical Veterinary and Life Sciences, University of Glasgow, G61 1QH, UK.
| | - Daniel T Haydon
- Boyd Orr Centre for Population and Ecosystem Health, College of Medical Veterinary and Life Sciences, University of Glasgow, G61 1QH, UK
| | - Samantha J Lycett
- Boyd Orr Centre for Population and Ecosystem Health, College of Medical Veterinary and Life Sciences, University of Glasgow, G61 1QH, UK
| | - Pablo R Murcia
- Medical Research Council (MRC) Centre for Virus Research, College of Medical, Veterinary and Life Sciences, University of Glasgow, G61 1QH, UK
| |
Collapse
|
149
|
Sijmons S, Van Ranst M, Maes P. Genomic and functional characteristics of human cytomegalovirus revealed by next-generation sequencing. Viruses 2014; 6:1049-72. [PMID: 24603756 PMCID: PMC3970138 DOI: 10.3390/v6031049] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 02/11/2014] [Accepted: 02/11/2014] [Indexed: 01/08/2023] Open
Abstract
The complete genome of human cytomegalovirus (HCMV) was elucidated almost 25 years ago using a traditional cloning and Sanger sequencing approach. Analysis of the genetic content of additional laboratory and clinical isolates has lead to a better, albeit still incomplete, definition of the coding potential and diversity of wild-type HCMV strains. The introduction of a new generation of massively parallel sequencing technologies, collectively called next-generation sequencing, has profoundly increased the throughput and resolution of the genomics field. These increased possibilities are already leading to a better understanding of the circulating diversity of HCMV clinical isolates. The higher resolution of next-generation sequencing provides new opportunities in the study of intrahost viral population structures. Furthermore, deep sequencing enables novel diagnostic applications for sensitive drug resistance mutation detection. RNA-seq applications have changed the picture of the HCMV transcriptome, which resulted in proof of a vast amount of splicing events and alternative transcripts. This review discusses the application of next-generation sequencing technologies, which has provided a clearer picture of the intricate nature of the HCMV genome. The continuing development and application of novel sequencing technologies will further augment our understanding of this ubiquitous, but elusive, herpesvirus.
Collapse
Affiliation(s)
- Steven Sijmons
- Laboratory of Clinical Virology, Rega Institute for Medical Research, K.U.Leuven, Minderbroedersstraat 10, Leuven BE-3000, Belgium.
| | - Marc Van Ranst
- Laboratory of Clinical Virology, Rega Institute for Medical Research, K.U.Leuven, Minderbroedersstraat 10, Leuven BE-3000, Belgium.
| | - Piet Maes
- Laboratory of Clinical Virology, Rega Institute for Medical Research, K.U.Leuven, Minderbroedersstraat 10, Leuven BE-3000, Belgium.
| |
Collapse
|
150
|
Töpfer A, Marschall T, Bull RA, Luciani F, Schönhuth A, Beerenwinkel N. Viral quasispecies assembly via maximal clique enumeration. PLoS Comput Biol 2014; 10:e1003515. [PMID: 24675810 PMCID: PMC3967922 DOI: 10.1371/journal.pcbi.1003515] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 01/31/2014] [Indexed: 11/25/2022] Open
Abstract
Virus populations can display high genetic diversity within individual hosts. The intra-host collection of viral haplotypes, called viral quasispecies, is an important determinant of virulence, pathogenesis, and treatment outcome. We present HaploClique, a computational approach to reconstruct the structure of a viral quasispecies from next-generation sequencing data as obtained from bulk sequencing of mixed virus samples. We develop a statistical model for paired-end reads accounting for mutations, insertions, and deletions. Using an iterative maximal clique enumeration approach, read pairs are assembled into haplotypes of increasing length, eventually enabling global haplotype assembly. The performance of our quasispecies assembly method is assessed on simulated data for varying population characteristics and sequencing technology parameters. Owing to its paired-end handling, HaploClique compares favorably to state-of-the-art haplotype inference methods. It can reconstruct error-free full-length haplotypes from low coverage samples and detect large insertions and deletions at low frequencies. We applied HaploClique to sequencing data derived from a clinical hepatitis C virus population of an infected patient and discovered a novel deletion of length 357±167 bp that was validated by two independent long-read sequencing experiments. HaploClique is available at https://github.com/armintoepfer/haploclique. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2-5.
Collapse
Affiliation(s)
- Armin Töpfer
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | - Rowena A. Bull
- Inflammation and Infection Research Centre, School of Medical Sciences, UNSW, Sydney, Australia
| | - Fabio Luciani
- Inflammation and Infection Research Centre, School of Medical Sciences, UNSW, Sydney, Australia
| | | | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|