1
|
Freire B, Ladra S, Parama JR, Salmela L. ViQUF: De Novo Viral Quasispecies Reconstruction Using Unitig-Based Flow Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1550-1562. [PMID: 35853050 DOI: 10.1109/tcbb.2022.3190282] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
During viral infection, intrahost mutation and recombination can lead to significant evolution, resulting in a population of viruses that harbor multiple haplotypes. The task of reconstructing these haplotypes from short-read sequencing data is called viral quasispecies assembly, and it can be categorized as a multiassembly problem. We consider the de novo version of the problem, where no reference is available. We present ViQUF, a de novo viral quasispecies assembler that addresses haplotype assembly and quantification. ViQUF obtains a first draft of the assembly graph from a de Bruijn graph. Then, solving a min-cost flow over a flow network built for each pair of adjacent vertices based on their paired-end information creates an approximate paired assembly graph with suggested frequency values as edge labels, which is the first frequency estimation. Then, original haplotypes are obtained through a greedy path reconstruction guided by a min-cost flow solution in the approximate paired assembly graph. ViQUF outputs the contigs with their frequency estimations. Results on real and simulated data show that ViQUF is at least four times faster using at most half of the memory than previous methods, while maintaining, and in some cases outperforming, the high quality of assembly and frequency estimation of overlap graph-based methodologies, which are known to be more accurate but slower than the de Bruijn graph-based approaches.
Collapse
|
2
|
Antipov D, Rayko M, Kolmogorov M, Pevzner PA. viralFlye: assembling viruses and identifying their hosts from long-read metagenomics data. Genome Biol 2022; 23:57. [PMID: 35189932 PMCID: PMC8862349 DOI: 10.1186/s13059-021-02566-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 12/03/2021] [Indexed: 11/10/2022] Open
Abstract
Although the use of long-read sequencing improves the contiguity of assembled viral genomes compared to short-read methods, assembling complex viral communities remains an open problem. We describe the viralFlye tool for identification and analysis of metagenome-assembled viruses in long-read assemblies. We show it significantly improves viral assemblies and demonstrate that long-reads result in a much larger array of predicted virus-host associations as compared to short-read assemblies. We demonstrate that the identification of novel CRISPR arrays in bacterial genomes from a newly assembled metagenomic sample provides information for predicting novel hosts for novel viruses.
Collapse
Affiliation(s)
- Dmitry Antipov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia.
| | - Mikhail Rayko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, USA
| |
Collapse
|
3
|
Knyazev S, Tsyvina V, Shankar A, Melnyk A, Artyomenko A, Malygina T, Porozov YB, Campbell EM, Switzer WM, Skums P, Mangul S, Zelikovsky A. Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction. Nucleic Acids Res 2021; 49:e102. [PMID: 34214168 PMCID: PMC8464054 DOI: 10.1093/nar/gkab576] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 05/25/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open
Abstract
Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient’s treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.
Collapse
Affiliation(s)
- Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA.,Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, USA
| | - Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Anupama Shankar
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Andrew Melnyk
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | | | - Tatiana Malygina
- International Scientific and Research Institute of Bioengineering, ITMO University, St. Petersburg 197101, Russia
| | - Yuri B Porozov
- World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia.,Department of Computational Biology, Sirius University of Science and Technology, Sochi 354340, Russia
| | - Ellsworth M Campbell
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - William M Switzer
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| |
Collapse
|
4
|
Knyazev S, Hughes L, Skums P, Zelikovsky A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief Bioinform 2021; 22:96-108. [PMID: 32568371 PMCID: PMC8485218 DOI: 10.1093/bib/bbaa101] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/24/2020] [Accepted: 05/04/2020] [Indexed: 01/04/2023] Open
Abstract
The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.
Collapse
|
5
|
Basodi S, Baykal PI, Zelikovsky A, Skums P, Pan Y. Analysis of heterogeneous genomic samples using image normalization and machine learning. BMC Genomics 2020; 21:405. [PMID: 33349236 PMCID: PMC7751093 DOI: 10.1186/s12864-020-6661-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures. RESULTS We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy. CONCLUSIONS Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.
Collapse
Affiliation(s)
- Sunitha Basodi
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA.
| | - Pelin Icer Baykal
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA.,The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 11991, Russia
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| | - Yi Pan
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| |
Collapse
|
6
|
Eliseev A, Gibson KM, Avdeyev P, Novik D, Bendall ML, Pérez-Losada M, Alexeev N, Crandall KA. Evaluation of haplotype callers for next-generation sequencing of viruses. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2020; 82:104277. [PMID: 32151775 PMCID: PMC7293574 DOI: 10.1016/j.meegid.2020.104277] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 03/04/2020] [Accepted: 03/06/2020] [Indexed: 01/30/2023]
Abstract
Currently, the standard practice for assembling next-generation sequencing (NGS) reads of viral genomes is to summarize thousands of individual short reads into a single consensus sequence, thus confounding useful intra-host diversity information for molecular phylodynamic inference. It is hypothesized that a few viral strains may dominate the intra-host genetic diversity with a variety of lower frequency strains comprising the rest of the population. Several software tools currently exist to convert NGS sequence variants into haplotypes. Previous benchmarks of viral haplotype reconstruction programs used simulation scenarios that are useful from a mathematical perspective but do not reflect viral evolution and epidemiology. Here, we tested twelve NGS haplotype reconstruction methods using viral populations simulated under realistic evolutionary dynamics. We simulated coalescent-based populations that spanned known levels of viral genetic diversity, including mutation rates, sample size and effective population size, to test the limits of the haplotype reconstruction methods and to ensure coverage of predicted intra-host viral diversity levels (especially HIV-1). All twelve investigated haplotype callers showed variable performance and produced drastically different results that were mainly driven by differences in mutation rate and, to a lesser extent, in effective population size. Most methods were able to accurately reconstruct haplotypes when genetic diversity was low. However, under higher levels of diversity (e.g., those seen intra-host HIV-1 infections), haplotype reconstruction quality was highly variable and, on average, poor. All haplotype reconstruction tools, except QuasiRecomb and ShoRAH, greatly underestimated intra-host diversity and the true number of haplotypes. PredictHaplo outperformed, in regard to highest precision, recall, and lowest UniFrac distance values, the other haplotype reconstruction tools followed by CliqueSNV, which, given more computational time, may have outperformed PredictHaplo. Here, we present an extensive comparison of available viral haplotype reconstruction tools and provide insights for future improvements in haplotype reconstruction tools using both short-read and long-read technologies.
Collapse
Affiliation(s)
- Anton Eliseev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keylie M Gibson
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA.
| | - Pavel Avdeyev
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Mathematics, George Washington University, Washington, DC, USA
| | - Dmitry Novik
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Matthew L Bendall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| | - Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Nikita Alexeev
- Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia
| | - Keith A Crandall
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
7
|
Baaijens JA, Van der Roest B, Köster J, Stougie L, Schönhuth A. Full-length de novo viral quasispecies assembly through variation graph construction. Bioinformatics 2019; 35:5086-5094. [DOI: 10.1093/bioinformatics/btz443] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Revised: 04/17/2019] [Accepted: 05/27/2019] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Viruses populate their hosts as a viral quasispecies: a collection of genetically related mutant strains. Viral quasispecies assembly is the reconstruction of strain-specific haplotypes from read data, and predicting their relative abundances within the mix of strains is an important step for various treatment-related reasons. Reference genome independent (‘de novo’) approaches have yielded benefits over reference-guided approaches, because reference-induced biases can become overwhelming when dealing with divergent strains. While being very accurate, extant de novo methods only yield rather short contigs. The remaining challenge is to reconstruct full-length haplotypes together with their abundances from such contigs.
Results
We present Virus-VG as a de novo approach to viral haplotype reconstruction from preassembled contigs. Our method constructs a variation graph from the short input contigs without making use of a reference genome. Then, to obtain paths through the variation graph that reflect the original haplotypes, we solve a minimization problem that yields a selection of maximal-length paths that is, optimal in terms of being compatible with the read coverages computed for the nodes of the variation graph. We output the resulting selection of maximal length paths as the haplotypes, together with their abundances. Benchmarking experiments on challenging simulated and real datasets show significant improvements in assembly contiguity compared to the input contigs, while preserving low error rates compared to the state-of-the-art viral quasispecies assemblers.
Availability and implementation
Virus-VG is freely available at https://bitbucket.org/jbaaijens/virus-vg.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jasmijn A Baaijens
- Life Sciences and Health Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands
| | | | - Johannes Köster
- Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Leen Stougie
- Life Sciences and Health Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands
- Department of Econometrics and Operations Research, Vrije Universiteit, Amsterdam, Netherlands
- INRIA-Erable, Grenoble, France
| | - Alexander Schönhuth
- Life Sciences and Health Group, Centrum Wiskunde & Informatica, Amsterdam, Netherlands
- INRIA-Erable, Grenoble, France
- Theoretical Biology and Bioinformatics, Utrecht University, Utrecht, Netherlands
| |
Collapse
|
8
|
Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons. PLoS Comput Biol 2018; 14:e1006498. [PMID: 30543621 PMCID: PMC6314628 DOI: 10.1371/journal.pcbi.1006498] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 01/02/2019] [Accepted: 09/10/2018] [Indexed: 01/07/2023] Open
Abstract
Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018. Viral populations constantly evolve and diversify. In this article we introduce a method, FLEA, for reconstructing and visualizing the details of evolutionary changes. FLEA specifically processes data from sequencing platforms that generate reads that are long, but error-prone. To study the evolutionary dynamics of entire genes during viral infection, data is collected via long-read sequencing at discrete time points, allowing us to understand how the virus changes over time. However, the experimental and sequencing process is imperfect, so the resulting data contain not only real evolutionary changes, but also mutations and other genetic artifacts caused by sequencing errors. Our method corrects most of these errors by combining thousands of erroneous sequences into a much smaller number of unique consensus sequences that represent biologically meaningful variation. The resulting high-quality sequences are used for further analysis, such as building an evolutionary tree that tracks and interprets the genetic changes in the viral population over time. FLEA is open source, and is freely available online.
Collapse
|
9
|
Differences in the Selection Bottleneck between Modes of Sexual Transmission Influence the Genetic Composition of the HIV-1 Founder Virus. PLoS Pathog 2016; 12:e1005619. [PMID: 27163788 PMCID: PMC4862634 DOI: 10.1371/journal.ppat.1005619] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 04/18/2016] [Indexed: 01/18/2023] Open
Abstract
Due to the stringent population bottleneck that occurs during sexual HIV-1 transmission, systemic infection is typically established by a limited number of founder viruses. Elucidation of the precise forces influencing the selection of founder viruses may reveal key vulnerabilities that could aid in the development of a vaccine or other clinical interventions. Here, we utilize deep sequencing data and apply a genetic distance-based method to investigate whether the mode of sexual transmission shapes the nascent founder viral genome. Analysis of 74 acute and early HIV-1 infected subjects revealed that 83% of men who have sex with men (MSM) exhibit a single founder virus, levels similar to those previously observed in heterosexual (HSX) transmission. In a metadata analysis of a total of 354 subjects, including HSX, MSM and injecting drug users (IDU), we also observed no significant differences in the frequency of single founder virus infections between HSX and MSM transmissions. However, comparison of HIV-1 envelope sequences revealed that HSX founder viruses exhibited a greater number of codon sites under positive selection, as well as stronger transmission indices possibly reflective of higher fitness variants. Moreover, specific genetic “signatures” within MSM and HSX founder viruses were identified, with single polymorphisms within gp41 enriched among HSX viruses while more complex patterns, including clustered polymorphisms surrounding the CD4 binding site, were enriched in MSM viruses. While our findings do not support an influence of the mode of sexual transmission on the number of founder viruses, they do demonstrate that there are marked differences in the selection bottleneck that can significantly shape their genetic composition. This study illustrates the complex dynamics of the transmission bottleneck and reveals that distinct genetic bottleneck processes exist dependent upon the mode of HIV-1 transmission. While the global spread of HIV-1 has been fueled by sexual transmission the genetic determinants underlying the transmission bottleneck remains poorly understood. Here we characterized founder virus population diversity from next generation sequencing data in a cohort of 74 acute and early HIV-1 infected individuals. We observe that the risk of multi-variant infection in men-who-have-sex-with-men (MSM) is not greater than that observed for heterosexuals (HSX), contrary to reports of higher rates of multiple founder virus infections in higher-risk MSM transmissions. These findings were further supported through a metadata analysis of 354 acute and early HIV-1 subjects. We did, however, observe differences between HSM and MSM founder viruses, including a higher selection barrier in HSX transmission with founder viruses being more cohort consensus-like that may be reflective of increased replicative fitness. We also identified a number of residues within Envelope that behave in a risk-dependent manner and could be key for HIV-1 transmission. These novel insights improve our understanding of the HIV-1 transmission bottleneck and underscore the differential selective pressures that founder viruses within the two major transmission risk groups are subjected to.
Collapse
|
10
|
Chedom DF, Murcia PR, Greenman CD. Inferring the Clonal Structure of Viral Populations from Time Series Sequencing. PLoS Comput Biol 2015; 11:e1004344. [PMID: 26571026 PMCID: PMC4646700 DOI: 10.1371/journal.pcbi.1004344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Accepted: 05/17/2015] [Indexed: 11/18/2022] Open
Abstract
RNA virus populations will undergo processes of mutation and selection resulting in a mixed population of viral particles. High throughput sequencing of a viral population subsequently contains a mixed signal of the underlying clones. We would like to identify the underlying evolutionary structures. We utilize two sources of information to attempt this; within segment linkage information, and mutation prevalence. We demonstrate that clone haplotypes, their prevalence, and maximum parsimony reticulate evolutionary structures can be identified, although the solutions may not be unique, even for complete sets of information. This is applied to a chain of influenza infection, where we infer evolutionary structures, including reassortment, and demonstrate some of the difficulties of interpretation that arise from deep sequencing due to artifacts such as template switching during PCR amplification.
Collapse
Affiliation(s)
- Donatien F. Chedom
- The Genome Analysis Centre, Norwich Research Park, Norwich, United Kingdom
| | - Pablo R. Murcia
- MRC-University of Glasgow Centre for Virus Research, United Kingdom
| | - Chris D. Greenman
- The Genome Analysis Centre, Norwich Research Park, Norwich, United Kingdom
- School of Computing Sciences, University of East Anglia, Norwich, United Kingdom
| |
Collapse
|
11
|
High-resolution genetic profile of viral genomes: why it matters. Curr Opin Virol 2015; 14:62-70. [DOI: 10.1016/j.coviro.2015.08.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 08/07/2015] [Accepted: 08/07/2015] [Indexed: 12/12/2022]
|
12
|
Ma J, Pallett D, Jiang H, Hou Y, Wang H. Mutational bias of Turnip Yellow Mosaic Virus in the context of host anti-viral gene silencing. Virology 2015; 486:2-6. [PMID: 26379088 DOI: 10.1016/j.virol.2015.08.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Revised: 05/01/2015] [Accepted: 08/21/2015] [Indexed: 01/04/2023]
Abstract
Plant Dicer-like (DCL) enzymes exhibit a GC-preference during anti-viral post-transcriptional gene silencing (PTGS), delivering an evolutionary selection pressure resulting in plant viruses with GC-poor genomes. However, some viruses, e.g. Turnip Yellow Mosaic Virus (TYMV, genus Tymovirus) have GC-rich genomes, raising the question as to whether or not DCL derived selection pressure affects these viruses. In this study we analyzed the virus-derived small interfering RNAs from TYMV-infected leaves of Brassica juncea showed that the TYMV population accumulated a mutational bias with AU replacing GC (GC-AU), demonstrating PTGS pressure. Interestingly, at the highly polymorphic sites the GC-AU bias was no longer observed. This suggests the presence of an unknown mechanism preventing mutational drift of the viral population and maintaining viral genome stability, despite the host PTGS pressure.
Collapse
Affiliation(s)
- Jinmin Ma
- BGI-shenzhen, Beishan Road, Yantian, Shenzhen 518083, China
| | - Denise Pallett
- NERC/Centre for Ecology & Hydrology, Benson Lane, Wallingford, Oxfordshire OX10 8BB, UK
| | - Hui Jiang
- BGI-shenzhen, Beishan Road, Yantian, Shenzhen 518083, China
| | - Yong Hou
- BGI-shenzhen, Beishan Road, Yantian, Shenzhen 518083, China
| | - Hui Wang
- BGI-shenzhen, Beishan Road, Yantian, Shenzhen 518083, China; NERC/Centre for Ecology & Hydrology, Benson Lane, Wallingford, Oxfordshire OX10 8BB, UK; Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK.
| |
Collapse
|
13
|
Rossi LMG, Escobar-Gutierrez A, Rahal P. Advanced molecular surveillance of hepatitis C virus. Viruses 2015; 7:1153-88. [PMID: 25781918 PMCID: PMC4379565 DOI: 10.3390/v7031153] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 02/05/2015] [Accepted: 02/20/2015] [Indexed: 12/12/2022] Open
Abstract
Hepatitis C virus (HCV) infection is an important public health problem worldwide. HCV exploits complex molecular mechanisms, which result in a high degree of intrahost genetic heterogeneity. This high degree of variability represents a challenge for the accurate establishment of genetic relatedness between cases and complicates the identification of sources of infection. Tracking HCV infections is crucial for the elucidation of routes of transmission in a variety of settings. Therefore, implementation of HCV advanced molecular surveillance (AMS) is essential for disease control. Accounting for virulence is also important for HCV AMS and both viral and host factors contribute to the disease outcome. Therefore, HCV AMS requires the incorporation of host factors as an integral component of the algorithms used to monitor disease occurrence. Importantly, implementation of comprehensive global databases and data mining are also needed for the proper study of the mechanisms responsible for HCV transmission. Here, we review molecular aspects associated with HCV transmission, as well as the most recent technological advances used for virus and host characterization. Additionally, the cornerstone discoveries that have defined the pathway for viral characterization are presented and the importance of implementing advanced HCV molecular surveillance is highlighted.
Collapse
Affiliation(s)
- Livia Maria Gonçalves Rossi
- Department of Biology, Institute of Bioscience, Language and Exact Science, Sao Paulo State University, Sao Jose do Rio Preto, SP 15054-000, Brazil.
| | | | - Paula Rahal
- Department of Biology, Institute of Bioscience, Language and Exact Science, Sao Paulo State University, Sao Jose do Rio Preto, SP 15054-000, Brazil.
| |
Collapse
|
14
|
Illingworth CJR, Fischer A, Mustonen V. Identifying selection in the within-host evolution of influenza using viral sequence data. PLoS Comput Biol 2014; 10:e1003755. [PMID: 25080215 PMCID: PMC4117419 DOI: 10.1371/journal.pcbi.1003755] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 06/13/2014] [Indexed: 02/07/2023] Open
Abstract
The within-host evolution of influenza is a vital component of its epidemiology. A question of particular interest is the role that selection plays in shaping the viral population over the course of a single infection. We here describe a method to measure selection acting upon the influenza virus within an individual host, based upon time-resolved genome sequence data from an infection. Analysing sequence data from a transmission study conducted in pigs, describing part of the haemagglutinin gene (HA1) of an influenza virus, we find signatures of non-neutrality in six of a total of sixteen infections. We find evidence for both positive and negative selection acting upon specific alleles, while in three cases, the data suggest the presence of time-dependent selection. In one infection we observe what is potentially a specific immune response against the virus; a non-synonymous mutation in an epitope region of the virus is found to be under initially positive, then strongly negative selection. Crucially, given the lack of homologous recombination in influenza, our method accounts for linkage disequilibrium between nucleotides at different positions in the haemagglutinin gene, allowing for the analysis of populations in which multiple mutations are present at any given time. Our approach offers a new insight into the dynamics of influenza infection, providing a detailed characterisation of the forces that underlie viral evolution. The evolution of the influenza virus is of great importance for human health. Through evolution, current influenza viruses develop the ability to infect people who have been vaccinated against earlier strains. New strains of influenza that infect birds and pigs could evolve to infect and spread between people, causing a global pandemic. The influenza virus lives within a human or animal host, so that viral evolution happens within, or in the spread between, individuals. As such, what happens to the virus during the course of an infection is a question of great interest. We here describe a statistical method that uses viral genome sequence data to measure how evolution affects the influenza virus within a single host. Studying data from infections transmitted between pigs, we find evidence for evolutionary adaptation in six of sixteen animals for which data were available. In one case, an immune response mounted by a pig against the virus is apparent. Our method provides a statistical framework for using sequence data to study viral evolution on very short timescales, enabling new research into within-host viral evolution.
Collapse
Affiliation(s)
| | - Andrej Fischer
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| | - Ville Mustonen
- Wellcome Trust Sanger Institute, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
15
|
Giallonardo FD, Töpfer A, Rey M, Prabhakaran S, Duport Y, Leemann C, Schmutz S, Campbell NK, Joos B, Lecca MR, Patrignani A, Däumer M, Beisel C, Rusert P, Trkola A, Günthard HF, Roth V, Beerenwinkel N, Metzner KJ. Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations. Nucleic Acids Res 2014; 42:e115. [PMID: 24972832 PMCID: PMC4132706 DOI: 10.1093/nar/gku537] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Next-generation sequencing (NGS) technologies enable new insights into the diversity of virus populations within their hosts. Diversity estimation is currently restricted to single-nucleotide variants or to local fragments of no more than a few hundred nucleotides defined by the length of sequence reads. To study complex heterogeneous virus populations comprehensively, novel methods are required that allow for complete reconstruction of the individual viral haplotypes. Here, we show that assembly of whole viral genomes of ∼8600 nucleotides length is feasible from mixtures of heterogeneous HIV-1 strains derived from defined combinations of cloned virus strains and from clinical samples of an HIV-1 superinfected individual. Haplotype reconstruction was achieved using optimized experimental protocols and computational methods for amplification, sequencing and assembly. We comparatively assessed the performance of the three NGS platforms 454 Life Sciences/Roche, Illumina and Pacific Biosciences for this task. Our results prove and delineate the feasibility of NGS-based full-length viral haplotype reconstruction and provide new tools for studying evolution and pathogenesis of viruses.
Collapse
Affiliation(s)
- Francesca Di Giallonardo
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland Life Science Zurich Graduate School, University of Zurich, 8057 Zurich, Switzerland
| | - Armin Töpfer
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Melanie Rey
- Department of Mathematics and Computer Science, University of Basel, 4056 Basel, Switzerland
| | - Sandhya Prabhakaran
- Department of Mathematics and Computer Science, University of Basel, 4056 Basel, Switzerland
| | - Yannick Duport
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Christine Leemann
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Stefan Schmutz
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Nottania K Campbell
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland Life Science Zurich Graduate School, University of Zurich, 8057 Zurich, Switzerland
| | - Beda Joos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Maria Rita Lecca
- Functional Genomics Center Zurich, University of Zurich, ETH Zurich, 8057 Zurich, Switzerland
| | - Andrea Patrignani
- Functional Genomics Center Zurich, University of Zurich, ETH Zurich, 8057 Zurich, Switzerland
| | - Martin Däumer
- Institut für Immunologie und Genetik, 67655 Kaiserslautern, Germany
| | - Christian Beisel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
| | - Peter Rusert
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Alexandra Trkola
- Institute of Medical Virology, University of Zurich, 8057 Zurich, Switzerland
| | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| | - Volker Roth
- Department of Mathematics and Computer Science, University of Basel, 4056 Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Karin J Metzner
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
| |
Collapse
|
16
|
HIV-1 quasispecies delineation by tag linkage deep sequencing. PLoS One 2014; 9:e97505. [PMID: 24842159 PMCID: PMC4026136 DOI: 10.1371/journal.pone.0097505] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Accepted: 04/17/2014] [Indexed: 12/16/2022] Open
Abstract
Trade-offs between throughput, read length, and error rates in high-throughput sequencing limit certain applications such as monitoring viral quasispecies. Here, we describe a molecular-based tag linkage method that allows assemblage of short sequence reads into long DNA fragments. It enables haplotype phasing with high accuracy and sensitivity to interrogate individual viral sequences in a quasispecies. This approach is demonstrated to deduce ∼2000 unique 1.3 kb viral sequences from HIV-1 quasispecies in vivo and after passaging ex vivo with a detection limit of ∼0.005% to ∼0.001%. Reproducibility of the method is validated quantitatively and qualitatively by a technical replicate. This approach can improve monitoring of the genetic architecture and evolution dynamics in any quasispecies population.
Collapse
|