1
|
Knyazev S, Hughes L, Skums P, Zelikovsky A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief Bioinform 2021; 22:96-108. [PMID: 32568371 PMCID: PMC8485218 DOI: 10.1093/bib/bbaa101] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/24/2020] [Accepted: 05/04/2020] [Indexed: 01/04/2023] Open
Abstract
The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.
Collapse
|
2
|
Abstract
Viruses, which are the most abundant biological entities on the planet, have been regarded as the "dark matter" of biology in the sense that despite their ubiquity and frequent presence in large numbers, their detection and analysis are not always straightforward. The majority of them are very small (falling under the limit of 0.5 μm), and collectively, they are extraordinarily diverse. In fact, the majority of the genetic diversity on the planet is found in the so-called virosphere, or the world of viruses. Furthermore, the most frequent viral agents of disease in humans display an RNA genome, and frequently evolve very fast, due to the fact that most of their polymerases are devoid of proofreading activity. Therefore, their detection, genetic characterization, and epidemiological surveillance are rather challenging. This review (part of the Curated Collection on Advances in Molecular Epidemiology of Infectious Diseases) describes many of the methods that, throughout the last few decades, have been used for viral detection and analysis. Despite the challenge of having to deal with high genetic diversity, the majority of these methods still depend on the amplification of viral genomic sequences, using sequence-specific or sequence-independent approaches, exploring thermal profiles or a single nucleic acid amplification temperature. Furthermore, viral populations, and especially those with RNA genomes, are not usually genetically uniform but encompass swarms of genetically related, though distinct, viral genomes known as viral quasispecies. Therefore, sequence analysis of viral amplicons needs to take this fact into consideration, as it constitutes a potential analytic problem. Possible technical approaches to deal with it are also described here. *This article is part of a curated collection.
Collapse
|
3
|
Butt SL, Taylor TL, Volkening JD, Dimitrov KM, Williams-Coplin D, Lahmers KK, Miller PJ, Rana AM, Suarez DL, Afonso CL, Stanton JB. Rapid virulence prediction and identification of Newcastle disease virus genotypes using third-generation sequencing. Virol J 2018; 15:179. [PMID: 30466441 PMCID: PMC6251111 DOI: 10.1186/s12985-018-1077-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 10/10/2018] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Newcastle disease (ND) outbreaks are global challenges to the poultry industry. Effective management requires rapid identification and virulence prediction of the circulating Newcastle disease viruses (NDV), the causative agent of ND. However, these diagnostics are hindered by the genetic diversity and rapid evolution of NDVs. METHODS An amplicon sequencing (AmpSeq) workflow for virulence and genotype prediction of NDV samples using a third-generation, real-time DNA sequencing platform is described here. 1D MinION sequencing of barcoded NDV amplicons was performed using 33 egg-grown isolates, (15 NDV genotypes), and 15 clinical swab samples collected from field outbreaks. Assembly-based data analysis was performed in a customized, Galaxy-based AmpSeq workflow. MinION-based results were compared to previously published sequences and to sequences obtained using a previously published Illumina MiSeq workflow. RESULTS For all egg-grown isolates, NDV was detected and virulence and genotype were accurately predicted. For clinical samples, NDV was detected in ten of eleven NDV samples. Six of the clinical samples contained two mixed genotypes as determined by MiSeq, of which the MinION method detected both genotypes in four samples. Additionally, testing a dilution series of one NDV isolate resulted in NDV detection in a dilution as low as 101 50% egg infectious dose per milliliter. This was accomplished in as little as 7 min of sequencing time, with a 98.37% sequence identity compared to the expected consensus obtained by MiSeq. CONCLUSION The depth of sequencing, fast sequencing capabilities, accuracy of the consensus sequences, and the low cost of multiplexing allowed for effective virulence prediction and genotype identification of NDVs currently circulating worldwide. The sensitivity of this protocol was preliminary tested using only one genotype. After more extensive evaluation of the sensitivity and specificity, this protocol will likely be applicable to the detection and characterization of NDV.
Collapse
Affiliation(s)
- Salman L. Butt
- Southeast Poultry Research Laboratory, US National Poultry Research Center, Agricultural Research Service, USDA, 934 College Station Road, Athens, GA 30605 USA
- Department of Pathology, College of Veterinary Medicine, University of Georgia, Athens, GA 30602 USA
| | - Tonya L. Taylor
- Southeast Poultry Research Laboratory, US National Poultry Research Center, Agricultural Research Service, USDA, 934 College Station Road, Athens, GA 30605 USA
| | | | - Kiril M. Dimitrov
- Southeast Poultry Research Laboratory, US National Poultry Research Center, Agricultural Research Service, USDA, 934 College Station Road, Athens, GA 30605 USA
| | - Dawn Williams-Coplin
- Southeast Poultry Research Laboratory, US National Poultry Research Center, Agricultural Research Service, USDA, 934 College Station Road, Athens, GA 30605 USA
| | - Kevin K. Lahmers
- Department of Biomedical Sciences & Pathobiology,VA-MD College of Veterinary Medicine, Virginia Tech, Blacksburg, VA USA
| | - Patti J. Miller
- Southeast Poultry Research Laboratory, US National Poultry Research Center, Agricultural Research Service, USDA, 934 College Station Road, Athens, GA 30605 USA
- Department of Population Health, College of Veterinary Medicine, 953 College Station Road, Athens, GA 30602 USA
| | - Asif M. Rana
- Hivet Animal Health Business, 667-P, Johar Town, Lahore, Pakistan
| | - David L. Suarez
- Southeast Poultry Research Laboratory, US National Poultry Research Center, Agricultural Research Service, USDA, 934 College Station Road, Athens, GA 30605 USA
| | - Claudio L. Afonso
- Southeast Poultry Research Laboratory, US National Poultry Research Center, Agricultural Research Service, USDA, 934 College Station Road, Athens, GA 30605 USA
| | - James B. Stanton
- Department of Pathology, College of Veterinary Medicine, University of Georgia, Athens, GA 30602 USA
| |
Collapse
|
4
|
Patiño-Galindo JÁ, González-Candelas F. Molecular evolution methods to study HIV-1 epidemics. Future Virol 2018; 13:399-404. [PMID: 29967650 DOI: 10.2217/fvl-2017-0159] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2017] [Accepted: 04/04/2018] [Indexed: 01/17/2023]
Abstract
Nucleotide sequences of HIV isolates are obtained routinely to evaluate the presence of resistance mutations to antiretroviral drugs. But, beyond their clinical use, these and other viral sequences include a wealth of information that can be used to better understand and characterize the epidemiology of HIV in relevant populations. In this review, we provide a brief overview of the main methods used to analyze HIV sequences, the data bases where reference sequences can be obtained, and some caveats about the possible applications for public health of these analyses, along with some considerations about their limitations and correct usage to derive robust and reliable conclusions.
Collapse
Affiliation(s)
- Juan Á Patiño-Galindo
- Department of Systems Biology, Columbia University, New York, NY 10032, USA.,Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Fernando González-Candelas
- Joint Research Unit "Infección y Salud Pública" FISABIO-Salud Pública/Universitat de València-Institute for Integrative Systems Biology (ISysBio, CSIC-UV) Valencia, Spain.,CIBER in Epidemiology & Public Health, Valencia, Spain.,Joint Research Unit "Infección y Salud Pública" FISABIO-Salud Pública/Universitat de València-Institute for Integrative Systems Biology (ISysBio, CSIC-UV) Valencia, Spain.,CIBER in Epidemiology & Public Health, Valencia, Spain
| |
Collapse
|
5
|
Pitfalls of restriction enzyme analysis in identifying, characterizing, typing, and naming viral pathogens in the era of whole genome data, as illustrated by HAdV type 55. Virol Sin 2017; 31:448-453. [PMID: 27822718 DOI: 10.1007/s12250-016-3862-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
|
6
|
Van der Borght K, Thys K, Wetzels Y, Clement L, Verbist B, Reumers J, van Vlijmen H, Aerssens J. QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles. BMC Bioinformatics 2015; 16:379. [PMID: 26554718 PMCID: PMC4641353 DOI: 10.1186/s12859-015-0812-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 10/31/2015] [Indexed: 12/03/2022] Open
Abstract
Background Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth (“deep sequencing”), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. Results For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNVD). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNVHS). To also increase specificity, SNVs called were overruled when their frequency was below the 80th percentile calculated on the distribution of error frequencies (QQ-SNVHS-P80). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNVD performed similarly to the existing approaches. QQ-SNVHS was more sensitive on all test sets but with more false positives. QQ-SNVHS-P80 was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5 %, QQ-SNVHS-P80 revealed a sensitivity of 100 % (vs. 40–60 % for the existing methods) and a specificity of 100 % (vs. 98.0–99.7 % for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5 % were consistently detected by QQ-SNVHS-P80 from different generations of Illumina sequencers. Conclusions We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0812-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Koen Van der Borght
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium. .,Interuniversity Institute for Biostatistics and statistical Bioinformatics, Katholieke Universiteit Leuven, B-3000, Leuven, Belgium.
| | - Kim Thys
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.
| | - Yves Wetzels
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.
| | - Lieven Clement
- Ghent University, Applied Mathematics, Informatics and Statistics, B-9000, Ghent, Belgium.
| | - Bie Verbist
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.
| | - Joke Reumers
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.
| | | | - Jeroen Aerssens
- Janssen Infectious Diseases-Diagnostics BVBA, B-2340, Beerse, Belgium.
| |
Collapse
|
7
|
Rossi LMG, Escobar-Gutierrez A, Rahal P. Advanced molecular surveillance of hepatitis C virus. Viruses 2015; 7:1153-88. [PMID: 25781918 PMCID: PMC4379565 DOI: 10.3390/v7031153] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Revised: 02/05/2015] [Accepted: 02/20/2015] [Indexed: 12/12/2022] Open
Abstract
Hepatitis C virus (HCV) infection is an important public health problem worldwide. HCV exploits complex molecular mechanisms, which result in a high degree of intrahost genetic heterogeneity. This high degree of variability represents a challenge for the accurate establishment of genetic relatedness between cases and complicates the identification of sources of infection. Tracking HCV infections is crucial for the elucidation of routes of transmission in a variety of settings. Therefore, implementation of HCV advanced molecular surveillance (AMS) is essential for disease control. Accounting for virulence is also important for HCV AMS and both viral and host factors contribute to the disease outcome. Therefore, HCV AMS requires the incorporation of host factors as an integral component of the algorithms used to monitor disease occurrence. Importantly, implementation of comprehensive global databases and data mining are also needed for the proper study of the mechanisms responsible for HCV transmission. Here, we review molecular aspects associated with HCV transmission, as well as the most recent technological advances used for virus and host characterization. Additionally, the cornerstone discoveries that have defined the pathway for viral characterization are presented and the importance of implementing advanced HCV molecular surveillance is highlighted.
Collapse
Affiliation(s)
- Livia Maria Gonçalves Rossi
- Department of Biology, Institute of Bioscience, Language and Exact Science, Sao Paulo State University, Sao Jose do Rio Preto, SP 15054-000, Brazil.
| | | | - Paula Rahal
- Department of Biology, Institute of Bioscience, Language and Exact Science, Sao Paulo State University, Sao Jose do Rio Preto, SP 15054-000, Brazil.
| |
Collapse
|
8
|
Preciado MV, Valva P, Escobar-Gutierrez A, Rahal P, Ruiz-Tovar K, Yamasaki L, Vazquez-Chacon C, Martinez-Guarneros A, Carpio-Pedroza JC, Fonseca-Coronado S, Cruz-Rivera M. Hepatitis C virus molecular evolution: Transmission, disease progression and antiviral therapy. World J Gastroenterol 2014; 20:15992-16013. [PMID: 25473152 PMCID: PMC4239486 DOI: 10.3748/wjg.v20.i43.15992] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Revised: 06/22/2014] [Accepted: 08/28/2014] [Indexed: 02/06/2023] Open
Abstract
Hepatitis C virus (HCV) infection represents an important public health problem worldwide. Reduction of HCV morbidity and mortality is a current challenge owned to several viral and host factors. Virus molecular evolution plays an important role in HCV transmission, disease progression and therapy outcome. The high degree of genetic heterogeneity characteristic of HCV is a key element for the rapid adaptation of the intrahost viral population to different selection pressures (e.g., host immune responses and antiviral therapy). HCV molecular evolution is shaped by different mechanisms including a high mutation rate, genetic bottlenecks, genetic drift, recombination, temporal variations and compartmentalization. These evolutionary processes constantly rearrange the composition of the HCV intrahost population in a staging manner. Remarkable advances in the understanding of the molecular mechanism controlling HCV replication have facilitated the development of a plethora of direct-acting antiviral agents against HCV. As a result, superior sustained viral responses have been attained. The rapidly evolving field of anti-HCV therapy is expected to broad its landscape even further with newer, more potent antivirals, bringing us one step closer to the interferon-free era.
Collapse
|
9
|
Boyd SD, Galli SJ, Schrijver I, Zehnder JL, Ashley EA, Merker JD. A Balanced Look at the Implications of Genomic (and Other "Omics") Testing for Disease Diagnosis and Clinical Care. Genes (Basel) 2014; 5:748-66. [PMID: 25257203 PMCID: PMC4198929 DOI: 10.3390/genes5030748] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 07/20/2014] [Accepted: 08/18/2014] [Indexed: 11/16/2022] Open
Abstract
The tremendous increase in DNA sequencing capacity arising from the commercialization of "next generation" instruments has opened the door to innumerable routes of investigation in basic and translational medical science. It enables very large data sets to be gathered, whose interpretation and conversion into useful knowledge is only beginning. A challenge for modern healthcare systems and academic medical centers is to apply these new methods for the diagnosis of disease and the management of patient care without unnecessary delay, but also with appropriate evaluation of the quality of data and interpretation, as well as the clinical value of the insights gained. Most critically, the standards applied for evaluating these new laboratory data and ensuring that the results and their significance are clearly communicated to patients and their caregivers should be at least as rigorous as those applied to other kinds of medical tests. Here, we present an overview of conceptual and practical issues to be considered in planning for the integration of genomic methods or, in principle, any other type of "omics" testing into clinical care.
Collapse
Affiliation(s)
- Scott D Boyd
- Department of Pathology, Stanford University, Stanford, CA 94305, USA.
| | - Stephen J Galli
- Department of Pathology, Stanford University, Stanford, CA 94305, USA.
| | - Iris Schrijver
- Department of Pathology, Stanford University, Stanford, CA 94305, USA.
| | - James L Zehnder
- Department of Pathology, Stanford University, Stanford, CA 94305, USA.
| | - Euan A Ashley
- Department of Medicine, Stanford University, Stanford, CA 94305, USA.
| | - Jason D Merker
- Department of Pathology, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
10
|
Gonçalves Rossi LM, Rahal P. Challenges in molecular epidemiology of hepatitis C virus. J Clin Virol 2014; 60:174-6. [DOI: 10.1016/j.jcv.2014.03.016] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 03/18/2014] [Accepted: 03/21/2014] [Indexed: 02/07/2023]
|
11
|
Hepatitis A virus: host interactions, molecular epidemiology and evolution. INFECTION GENETICS AND EVOLUTION 2013; 21:227-43. [PMID: 24200587 DOI: 10.1016/j.meegid.2013.10.023] [Citation(s) in RCA: 106] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2013] [Revised: 10/25/2013] [Accepted: 10/26/2013] [Indexed: 12/16/2022]
Abstract
Infection with hepatitis A virus (HAV) is the commonest viral cause of liver disease and presents an important public health problem worldwide. Several unique HAV properties and molecular mechanisms of its interaction with host were recently discovered and should aid in clarifying the pathogenesis of hepatitis A. Genetic characterization of HAV strains have resulted in the identification of different genotypes and subtypes, which exhibit a characteristic worldwide distribution. Shifts in HAV endemicity occurring in different parts of the world, introduction of genetically diverse strains from geographically distant regions, genotype displacement observed in some countries and population expansion detected in the last decades of the 20th century using phylogenetic analysis are important factors contributing to the complex dynamics of HAV infections worldwide. Strong selection pressures, some of which, like usage of deoptimized codons, are unique to HAV, limit genetic variability of the virus. Analysis of subgenomic regions has been proven useful for outbreak investigations. However, sharing short sequences among epidemiologically unrelated strains indicates that specific identification of HAV strains for molecular surveillance can be achieved only using whole-genome sequences. Here, we present up-to-date information on the HAV molecular epidemiology and evolution, and highlight the most relevant features of the HAV-host interactions.
Collapse
|