Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Skums P, Dimitrova Z, Campo DS, Vaughan G, Rossi L, Forbi JC, Yokosawa J, Zelikovsky A, Khudyakov Y. Efficient error correction for next-generation sequencing of viral amplicons. BMC Bioinformatics 2012;13 Suppl 10:S6. [PMID: 22759430 PMCID: PMC3382444 DOI: 10.1186/1471-2105-13-s10-s6] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open

For:	Skums P, Dimitrova Z, Campo DS, Vaughan G, Rossi L, Forbi JC, Yokosawa J, Zelikovsky A, Khudyakov Y. Efficient error correction for next-generation sequencing of viral amplicons. BMC Bioinformatics 2012;13 Suppl 10:S6. [PMID: 22759430 PMCID: PMC3382444 DOI: 10.1186/1471-2105-13-s10-s6] [Citation(s) in RCA: 79] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open

Number

Cited by Other Article(s)

Rosenberger G, Li W, Turunen M, He J, Subramaniam PS, Pampou S, Griffin AT, Karan C, Kerwin P, Murray D, Honig B, Liu Y, Califano A. Network-based elucidation of colon cancer drug resistance by phosphoproteomic time-series analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528736. [PMID: 36824919 PMCID: PMC9949144 DOI: 10.1101/2023.02.15.528736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]

Affiliation(s)

George Rosenberger Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
Wenxue Li Yale Cancer Biology Institute, Yale University, West Haven, CT, USA
Mikko Turunen Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
Jing He Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA Present address: Regeneron Genetics Center, Tarrytown, NY, USA
Prem S Subramaniam Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
Sergey Pampou Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA
Aaron T Griffin Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA Medical Scientist Training Program, Columbia University Irving Medical Center, New York, NY, USA
Charles Karan Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA
Patrick Kerwin Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
Diana Murray Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
Barry Honig Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
Yansheng Liu Yale Cancer Biology Institute, Yale University, West Haven, CT, USA Department of Pharmacology, Yale University School of Medicine, New Haven, CT, USA
Andrea Califano Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA Department of Biochemistry & Molecular Biophysics, Columbia University Irving Medical Center, New York, NY, USA Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA

Collapse

K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:8077664. [PMID: 35875730 PMCID: PMC9303089 DOI: 10.1155/2022/8077664] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 06/13/2022] [Indexed: 11/26/2022]

Briscoe L, Balliu B, Sankararaman S, Halperin E, Garud NR. Evaluating supervised and unsupervised background noise correction in human gut microbiome data. PLoS Comput Biol 2022;18:e1009838. [PMID: 35130266 PMCID: PMC8853548 DOI: 10.1371/journal.pcbi.1009838] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 02/17/2022] [Accepted: 01/15/2022] [Indexed: 12/13/2022] Open

Abstract

The ability to predict human phenotypes and identify biomarkers of disease from metagenomic data is crucial for the development of therapeutics for microbiome-associated diseases. However, metagenomic data is commonly affected by technical variables unrelated to the phenotype of interest, such as sequencing protocol, which can make it difficult to predict phenotype and find biomarkers of disease. Supervised methods to correct for background noise, originally designed for gene expression and RNA-seq data, are commonly applied to microbiome data but may be limited because they cannot account for unmeasured sources of variation. Unsupervised approaches address this issue, but current methods are limited because they are ill-equipped to deal with the unique aspects of microbiome data, which is compositional, highly skewed, and sparse. We perform a comparative analysis of the ability of different denoising transformations in combination with supervised correction methods as well as an unsupervised principal component correction approach that is presently used in other domains but has not been applied to microbiome data to date. We find that the unsupervised principal component correction approach has comparable ability in reducing false discovery of biomarkers as the supervised approaches, with the added benefit of not needing to know the sources of variation apriori. However, in prediction tasks, it appears to only improve prediction when technical variables contribute to the majority of variance in the data. As new and larger metagenomic datasets become increasingly available, background noise correction will become essential for generating reproducible microbiome analyses.

The human gut microbiome is known to play a major role in health and is associated with many diseases including colorectal cancer, obesity, and diabetes. The prediction of host phenotypes and identification of biomarkers of disease is essential for harnessing the therapeutic potential of the microbiome. However, many metagenomic datasets are affected by technical variables that introduce unwanted variation that can confound the ability to predict phenotypes and identify biomarkers. Currently, supervised methods originally designed for gene expression and RNA-seq data are commonly applied to microbiome data for correction of background noise, but they are limited in that they cannot correct for unmeasured sources of variation. Unsupervised approaches address this issue, but current methods are limited because they are ill-equipped to deal with the unique aspects of microbiome data, which is compositional, highly skewed, and sparse. We perform a comparative analysis of the ability of different denoising transformations in combination with supervised correction methods as well as an unsupervised principal component correction approach and find that all correction approaches reduce false positives for biomarker discovery. In the task of predicting phenotypes, different approaches have varying success where the unsupervised correction can improve prediction when technical variables contribute to the majority of variance in the data. As new and larger metagenomic datasets become increasingly available, background noise correction will become essential for generating reproducible microbiome analyses.

Collapse

Affiliation(s)

Leah Briscoe Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, California, United States of America * E-mail: (LB); (EH); (NRG)
Brunilda Balliu Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
Sriram Sankararaman Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
Eran Halperin Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America Department of Anesthesiology and Perioperative Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America Institute of Precision Health, University of California Los Angeles, Los Angeles, California, United States of America * E-mail: (LB); (EH); (NRG)
Nandita R. Garud Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California, United States of America * E-mail: (LB); (EH); (NRG)

Collapse

Knyazev S, Tsyvina V, Shankar A, Melnyk A, Artyomenko A, Malygina T, Porozov YB, Campbell EM, Switzer WM, Skums P, Mangul S, Zelikovsky A. Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction. Nucleic Acids Res 2021;49:e102. [PMID: 34214168 PMCID: PMC8464054 DOI: 10.1093/nar/gkab576] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 05/25/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open

Akiyama MJ, Lipsey D, Ganova-Raeva L, Punkova LT, Agyemang L, Sue A, Ramachandran S, Khudyakov Y, Litwin AH. A Phylogenetic Analysis of Hepatitis C Virus Transmission, Relapse, and Reinfection Among People Who Inject Drugs Receiving Opioid Agonist Therapy. J Infect Dis 2021;222:488-498. [PMID: 32150621 DOI: 10.1093/infdis/jiaa100] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 03/03/2020] [Indexed: 12/13/2022] Open

Knyazev S, Hughes L, Skums P, Zelikovsky A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief Bioinform 2021;22:96-108. [PMID: 32568371 PMCID: PMC8485218 DOI: 10.1093/bib/bbaa101] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/24/2020] [Accepted: 05/04/2020] [Indexed: 01/04/2023] Open

Icer Baykal PB, Lara J, Khudyakov Y, Zelikovsky A, Skums P. Quantitative differences between intra-host HCV populations from persons with recently established and persistent infections. Virus Evol 2020;7:veaa103. [PMID: 33505710 PMCID: PMC7816669 DOI: 10.1093/ve/veaa103] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Abstract

Detection of incident hepatitis C virus (HCV) infections is crucial for identification of outbreaks and development of public health interventions. However, there is no single diagnostic assay for distinguishing recent and persistent HCV infections. HCV exists in each infected host as a heterogeneous population of genomic variants, whose evolutionary dynamics remain incompletely understood. Genetic analysis of such viral populations can be applied to the detection of incident HCV infections and used to understand intra-host viral evolution. We studied intra-host HCV populations sampled using next-generation sequencing from 98 recently and 256 persistently infected individuals. Genetic structure of the populations was evaluated using 245,878 viral sequences from these individuals and a set of selected features measuring their diversity, topological structure, complexity, strength of selection, epistasis, evolutionary dynamics, and physico-chemical properties. Distributions of the viral population features differ significantly between recent and persistent infections. A general increase in viral genetic diversity from recent to persistent infections is frequently accompanied by decline in genomic complexity and increase in structuredness of the HCV population, likely reflecting a high level of intra-host adaptation at later stages of infection. Using these findings, we developed a machine learning classifier for the infection staging, which yielded a detection accuracy of 95.22 per cent, thus providing a higher accuracy than other genomic-based models. The detection of a strong association between several HCV genetic factors and stages of infection suggests that intra-host HCV population develops in a complex but regular and predictable manner in the course of infection. The proposed models may serve as a foundation of cyber-molecular assays for staging infection, which could potentially complement and/or substitute standard laboratory assays.

Collapse

Basodi S, Baykal PI, Zelikovsky A, Skums P, Pan Y. Analysis of heterogeneous genomic samples using image normalization and machine learning. BMC Genomics 2020;21:405. [PMID: 33349236 PMCID: PMC7751093 DOI: 10.1186/s12864-020-6661-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures.

RESULTS

We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy.

CONCLUSIONS

Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.

Collapse

Tambe A, Pachter L. Barcode identification for single cell genomics. BMC Bioinformatics 2019;20:32. [PMID: 30654736 PMCID: PMC6337828 DOI: 10.1186/s12859-019-2612-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 01/07/2019] [Indexed: 02/07/2023] Open

Ramachandran S, Thai H, Forbi JC, Galang RR, Dimitrova Z, Xia GL, Lin Y, Punkova LT, Pontones PR, Gentry J, Blosser SJ, Lovchik J, Switzer WM, Teshale E, Peters P, Ward J, Khudyakov Y. A large HCV transmission network enabled a fast-growing HIV outbreak in rural Indiana, 2015. EBioMedicine 2018;37:374-381. [PMID: 30448155 PMCID: PMC6284413 DOI: 10.1016/j.ebiom.2018.10.007] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 10/02/2018] [Indexed: 12/27/2022] Open

Abstract

Background

A high prevalence (92.3%) of hepatitis C virus (HCV) co-infection among HIV patients identified during a large HIV outbreak associated with injection of oxymorphone in Indiana prompted genetic analysis of HCV strains.

Methods

Molecular epidemiological analysis of HCV-positive samples included genotyping, sampling intra-host HVR1 variants by next-generation sequencing (NGS) and constructing transmission networks using Global Hepatitis Outbreak and Surveillance Technology (GHOST).

Findings

Results from the 492 samples indicate predominance of HCV genotypes 1a (72.2%) and 3a (20.4%), and existence of 2 major endemic NS5B clusters involving 49.8% of the sequenced strains. Among 76 HIV co-infected patients, 60.5% segregated into 2 endemic clusters. NGS analyses of 281 cases identified 826,917 unique HVR1 sequences and 51 cases of mixed subtype/genotype infections. GHOST mapped 23 transmission clusters. One large cluster (n = 130) included 50 cases infected with ≥2 subtypes/genotypes and 43 cases co-infected with HIV. Rapid strain replacement and superinfection with different strains were found among 7 of 12 cases who were followed up.

Interpretation

GHOST enabled mapping of HCV transmission networks among persons who inject drugs (PWID). Findings of numerous transmission clusters, mixed-genotype infections and rapid succession of infections with different HCV strains indicate a high rate of HCV spread. Co-localization of HIV co-infected patients in the major HCV clusters suggests that HIV dissemination was enabled by existing HCV transmission networks that likely perpetuated HCV in the community for years. Identification of transmission networks is an important step to guiding efficient public health interventions for preventing and interrupting HCV and HIV transmission among PWID.

Fund

US Centers for Disease Control and Prevention, and US state and local public health departments.

Collapse

Affiliation(s)

Sumathi Ramachandran Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA.
Hong Thai Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
Joseph C Forbi Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
Romeo Regi Galang Epidemic Intelligence Service, Centers for Disease Control and Prevention, Atlanta, GA, USA
Zoya Dimitrova Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
Guo-Liang Xia Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
Yulin Lin Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
Lili T Punkova Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
Pamela R Pontones Indiana State Department of Health, USA
Jessica Gentry Indiana State Department of Health, USA
Sara J Blosser Indiana State Department of Health, USA
Judith Lovchik Indiana State Department of Health, USA
William M Switzer Centers for Disease Control and Prevention, Division of HIV/AIDS Prevention, USA
Eyasu Teshale Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
Philip Peters Centers for Disease Control and Prevention, Division of HIV/AIDS Prevention, USA
John Ward Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
Yury Khudyakov Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA

Collapse

Saleem S, Ali A, Khubaib B, Akram M, Fatima Z, Idrees M. Genetic diversity of Hepatitis C Virus in Pakistan using Next Generation Sequencing. J Clin Virol 2018;108:26-31. [PMID: 30219747 DOI: 10.1016/j.jcv.2018.09.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Revised: 08/14/2018] [Accepted: 09/07/2018] [Indexed: 01/06/2023]

Lara J, Teka MA, Sims S, Xia GL, Ramachandran S, Khudyakov Y. HCV adaptation to HIV coinfection. INFECTION GENETICS AND EVOLUTION 2018;65:216-225. [PMID: 30075255 DOI: 10.1016/j.meegid.2018.07.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 07/25/2018] [Accepted: 07/30/2018] [Indexed: 02/07/2023]

Abstract

Human immunodeficiency virus (HIV) infection is rising as a leading cause of morbidity and mortality among hepatitis C virus (HCV)-infected patients. Both viruses interact in co-infected hosts, which may affect their intra-host evolution, potentially leading to differing genetic composition of viral populations in co-infected (CIP) and mono-infected (MIP) patients. Here, we investigate genetic differences between intra-host variants of the HCV hypervariable region 1 (HVR1) sampled from CIP and MIP. Nucleotide (nt) sequences of intra-host HCV HVR1 variants (N = 28,622) obtained from CIP (N = 112) and MIP (n = 176) were represented using 148 physical-chemical (PhyChem) indexes of DNA nt dimers. Significant (p < .0001) differences in the means and frequency distributions of 7 PhyChem properties were found between HVR1 variants from both groups. Linear projection analysis of 29 PhyChem features extracted from such PhyChem properties showed that the CIP and MIP HVR1 variants have a distinct distribution in the modeled 2D-space, with only ~1.3% of PhyChem profiles (N = 6782), shared by all HVR1 variants, being found in both groups. Probabilistic neural network (PNN) and naïve Bayesian (NB) classifiers trained on the PhyChem features accurately classified HVR1 variants by the group in cross-validation experiments (AUROC ≥ 0.96). Similarly, both models showed a high accuracy (AUROC ≥ 0.95) when evaluated on a test dataset of HVR1 sequences obtained from 10 patients, data from whom were not used for model building. Both models performed at the expected lower accuracy on randomly labeled datasets in cross-validation experiments (AUROC = 0.50). The random-label trained PNN showed a similar drop in accuracy on the test dataset (AUROC = 0.48), indicating that the detected associations were unlikely due to random correlations. Marked differences in genetic composition of HCV HVR1 variants sampled from CIP and MIP suggest differing intra-host HCV evolution in the presence of HIV infection. PhyChem features identified here may be used for detection of HIV infection from intra-host HCV variants alone in co-infected patients, thus facilitating monitoring for HIV introduction to high-risk populations with high HCV prevalence.

Collapse

Hathaway NJ, Parobek CM, Juliano JJ, Bailey JA. SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing. Nucleic Acids Res 2018;46:e21. [PMID: 29202193 PMCID: PMC5829576 DOI: 10.1093/nar/gkx1201] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Revised: 11/16/2017] [Accepted: 11/20/2017] [Indexed: 01/08/2023] Open

Campo DS, Zhang J, Ramachandran S, Khudyakov Y. Transmissibility of intra-host hepatitis C virus variants. BMC Genomics 2017;18:881. [PMID: 29244001 PMCID: PMC5731494 DOI: 10.1186/s12864-017-4267-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Abstract

Background

Intra-host hepatitis C virus (HCV) populations are genetically heterogeneous and organized in subpopulations. With the exception of blood transfusions, transmission of HCV occurs via a small number of genetic variants, the effect of which is frequently described as a bottleneck. Stochasticity of transmission associated with the bottleneck is usually used to explain genetic differences among HCV populations identified in the source and recipient cases, which may be further exacerbated by intra-host HCV evolution and differential biological capacity of HCV variants to successfully establish a population in a new host.

Results

Transmissibility was formulated as a property that can be measured from experimental Ultra-Deep Sequencing (UDS) data. The UDS data were obtained from one large hepatitis C outbreak involving an epidemiologically defined source and 18 recipient cases. k-Step networks of HCV variants were constructed and used to identify a potential association between transmissibility and network centrality of individual HCV variants from the source. An additional dataset obtained from nine other HCV outbreaks with known directionality of transmission was used for validation.

Transmissibility was not found to be dependent on high frequency of variants in the source, supporting the earlier observations of transmission of minority variants. Among all tested measures of centrality, the highest correlation of transmissibility was found with Hamming centrality (r = 0.720; p = 1.57 E-71). Correlation between genetic distances and differences in transmissibility among HCV variants from the source was found to be 0.3276 (Mantel Test, p = 9.99 E-5), indicating association between genetic proximity and transmissibility. A strong correlation ranging from 0.565–0.947 was observed between Hamming centrality and transmissibility in 7 of the 9 additional transmission clusters (p < 0.05).

Conclusions

Transmission is not an exclusively stochastic process. Transmissibility, as formally measured in this study, is associated with certain biological properties that also define location of variants in the genetic space occupied by the HCV strain from the source. The measure may also be applicable to other highly heterogeneous viruses. Besides improving accuracy of outbreak investigations, this finding helps with the understanding of molecular mechanisms contributing to establishment of chronic HCV infection.

Collapse

Malhotra R, Jha M, Poss M, Acharya R. A random forest classifier for detecting rare variants in NGS data from viral populations. Comput Struct Biotechnol J 2017;15:388-395. [PMID: 28819548 PMCID: PMC5548337 DOI: 10.1016/j.csbj.2017.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 07/01/2017] [Accepted: 07/03/2017] [Indexed: 11/28/2022] Open

Rytsareva I, Campo DS, Zheng Y, Sims S, Thankachan SV, Tetik C, Chirag J, Chockalingam SP, Sue A, Aluru S, Khudyakov Y. Efficient detection of viral transmissions with Next-Generation Sequencing data. BMC Genomics 2017;18:372. [PMID: 28589864 PMCID: PMC5461558 DOI: 10.1186/s12864-017-3732-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Molecular analysis has been frequently used in the study of HCV outbreaks and transmission chains; helping identify a cluster of sequences as linked by transmission if their genetic distances are below a previously defined threshold. However, HCV exists as a population of numerous variants in each infected individual and it has been observed that minority variants in the source are often the ones responsible for transmission, a situation that precludes the use of a single sequence per individual because many such transmissions would be missed. The use of Next-Generation Sequencing immensely increases the sensitivity of transmission detection but brings a considerable computational challenge because all sequences need to be compared among all pairs of samples.

METHODS

We developed a three-step strategy that filters pairs of samples according to different criteria: (i) a k-mer bloom filter, (ii) a Levenhstein filter and (iii) a filter of identical sequences. We applied these three filters on a set of samples that cover the spectrum of genetic relationships among HCV cases, from being part of the same transmission cluster, to belonging to different subtypes.

RESULTS

Our three-step filtering strategy rapidly removes 85.1% of all the pairwise sample comparisons and 91.0% of all pairwise sequence comparisons, accurately establishing which pairs of HCV samples are below the relatedness threshold.

CONCLUSIONS

We present a fast and efficient three-step filtering strategy that removes most sequence comparisons and accurately establishes transmission links of any threshold-based method. This highly efficient workflow will allow a faster response and molecular detection capacity, improving the rate of detection of viral transmissions with molecular data.

Collapse

Palmer BA, Dimitrova Z, Skums P, Crosbie O, Kenny-Walsh E, Fanning LJ. Ultradeep Pyrosequencing of Hepatitis C Virus to Define Evolutionary Phenotypes. Bio Protoc 2017;7:e2284. [PMID: 34541061 DOI: 10.21769/bioprotoc.2284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Revised: 02/19/2017] [Accepted: 04/20/2017] [Indexed: 11/02/2022] Open

Palmer BA, Fanning LJ. Synonymous Co-Variation across the E1/E2 Gene Junction of Hepatitis C Virus Defines Virion Fitness. PLoS One 2016;11:e0167089. [PMID: 27880830 PMCID: PMC5120871 DOI: 10.1371/journal.pone.0167089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 11/07/2016] [Indexed: 11/18/2022] Open

Hepatitis B virus resistance substitutions: long-term analysis by next-generation sequencing. Arch Virol 2016;161:2885-91. [PMID: 27447462 DOI: 10.1007/s00705-016-2959-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Accepted: 06/28/2016] [Indexed: 01/07/2023]

Campo DS, Roh HJ, Pearlman BL, Fierer DS, Ramachandran S, Vaughan G, Hinds A, Dimitrova Z, Skums P, Khudyakov Y. Increased Mitochondrial Genetic Diversity in Persons Infected With Hepatitis C Virus. Cell Mol Gastroenterol Hepatol 2016;2:676-684. [PMID: 28174739 PMCID: PMC5042856 DOI: 10.1016/j.jcmgh.2016.05.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 05/15/2016] [Indexed: 12/22/2022]

Sede M, Parra M, Manrique JM, Laufer N, Jones LR, Quarleri J. Evolution of hepatitis C virus in HIV coinfected patients under antiretroviral therapy. INFECTION GENETICS AND EVOLUTION 2016;43:186-96. [PMID: 27234841 DOI: 10.1016/j.meegid.2016.05.032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Revised: 05/13/2016] [Accepted: 05/23/2016] [Indexed: 02/07/2023]

Affiliation(s)

Mariano Sede Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires, Paraguay 2155-Piso 11, C1121ABG Buenos Aires, Argentina
Micaela Parra Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires, Paraguay 2155-Piso 11, C1121ABG Buenos Aires, Argentina
Julieta M Manrique Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, 9 de Julio y Belgrano S/N, 9100 Trelew, Chubut, Argentina
Natalia Laufer Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires, Paraguay 2155-Piso 11, C1121ABG Buenos Aires, Argentina
Leandro R Jones Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, 9 de Julio y Belgrano S/N, 9100 Trelew, Chubut, Argentina.
Jorge Quarleri Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires, Paraguay 2155-Piso 11, C1121ABG Buenos Aires, Argentina.

Collapse

Beltman JB, Urbanus J, Velds A, van Rooij N, Rohr JC, Naik SH, Schumacher TN. Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells. BMC Bioinformatics 2016;17:151. [PMID: 27038897 PMCID: PMC4818877 DOI: 10.1186/s12859-016-0999-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Accepted: 03/23/2016] [Indexed: 12/31/2022] Open

Laehnemann D, Borkhardt A, McHardy AC. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform 2016;17:154-79. [PMID: 26026159 PMCID: PMC4719071 DOI: 10.1093/bib/bbv029] [Citation(s) in RCA: 177] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 04/09/2015] [Indexed: 12/23/2022] Open

Network Analysis of the Chronic Hepatitis C Virome Defines Hypervariable Region 1 Evolutionary Phenotypes in the Context of Humoral Immune Responses. J Virol 2015;90:3318-29. [PMID: 26719263 DOI: 10.1128/jvi.02995-15] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 12/22/2015] [Indexed: 02/06/2023] Open

Abstract

UNLABELLED

Hypervariable region 1 (HVR1) of hepatitis C virus (HCV) comprises the first 27 N-terminal amino acid residues of E2. It is classically seen as the most heterogeneous region of the HCV genome. In this study, we assessed HVR1 evolution by using ultradeep pyrosequencing for a cohort of treatment-naive, chronically infected patients over a short, 16-week period. Organization of the sequence set into connected components that represented single nucleotide substitution events revealed a network dominated by highly connected, centrally positioned master sequences. HVR1 phenotypes were observed to be under strong purifying (stationary) and strong positive (antigenic drift) selection pressures, which were coincident with advancing patient age and cirrhosis of the liver. It followed that stationary viromes were dominated by a single HVR1 variant surrounded by minor variants comprised from conservative single amino acid substitution events. We present evidence to suggest that neutralization antibody efficacy was diminished for stationary-virome HVR1 variants. Our results identify the HVR1 network structure during chronic infection as the preferential dominance of a single variant within a narrow sequence space.

IMPORTANCE

HCV infection is often asymptomatic, and chronic infection is generally well established in advance of initial diagnosis and subsequent treatment. HVR1 can undergo rapid sequence evolution during acute infection, and the variant pool is typically seen to diverge away from ancestral sequences as infection progresses from the acute to the chronic phase. In this report, we describe HVR1 viromes in chronically infected patients that are defined by a dominant epitope located centrally within a narrow variant pool. Our findings suggest that weakened humoral immune activity, as a consequence of persistent chronic infection, allows for the acquisition and maintenance of host-specific adaptive mutations at HVR1 that reflect virus fitness.

Collapse

Forbi JC, Layden JE, Phillips RO, Mora N, Xia GL, Campo DS, Purdy MA, Dimitrova ZE, Owusu DO, Punkova LT, Skums P, Owusu-Ofori S, Sarfo FS, Vaughan G, Roh H, Opare-Sem OK, Cooper RS, Khudyakov YE. Next-Generation Sequencing Reveals Frequent Opportunities for Exposure to Hepatitis C Virus in Ghana. PLoS One 2015;10:e0145530. [PMID: 26683463 PMCID: PMC4684299 DOI: 10.1371/journal.pone.0145530] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 12/04/2015] [Indexed: 12/14/2022] Open

Abstract

Globally, hepatitis C Virus (HCV) infection is responsible for a large proportion of persons with liver disease, including cancer. The infection is highly prevalent in sub-Saharan Africa. West Africa was identified as a geographic origin of two HCV genotypes. However, little is known about the genetic composition of HCV populations in many countries of the region. Using conventional and next-generation sequencing (NGS), we identified and genetically characterized 65 HCV strains circulating among HCV-positive blood donors in Kumasi, Ghana. Phylogenetic analysis using consensus sequences derived from 3 genomic regions of the HCV genome, 5'-untranslated region, hypervariable region 1 (HVR1) and NS5B gene, consistently classified the HCV variants (n = 65) into genotypes 1 (HCV-1, 15%) and genotype 2 (HCV-2, 85%). The Ghanaian and West African HCV-2 NS5B sequences were found completely intermixed in the phylogenetic tree, indicating a substantial genetic heterogeneity of HCV-2 in Ghana. Analysis of HVR1 sequences from intra-host HCV variants obtained by NGS showed that three donors were infected with >1 HCV strain, including infections with 2 genotypes. Two other donors share an HCV strain, indicating HCV transmission between them. The HCV-2 strain sampled from one donor was replaced with another HCV-2 strain after only 2 months of observation, indicating rapid strain switching. Bayesian analysis estimated that the HCV-2 strains in Ghana were expanding since the 16^th century. The blood donors in Kumasi, Ghana, are infected with a very heterogeneous HCV population of HCV-1 and HCV-2, with HCV-2 being prevalent. The detection of three cases of co- or super-infections and transmission linkage between 2 cases suggests frequent opportunities for HCV exposure among the blood donors and is consistent with the reported high HCV prevalence. The conditions for effective HCV-2 transmission existed for ~ 3–4 centuries, indicating a long epidemic history of HCV-2 in Ghana.

Collapse

Affiliation(s)

Joseph C. Forbi Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America * E-mail:
Jennifer E. Layden Department of Public Health Sciences, Loyola University Chicago, Maywood, Illinois, United States of America Department of Medicine, Loyola University Chicago, Stritch School of Medicine, Maywood, IL, United States of America
Richard O. Phillips Komfo Anokye Teaching Hospital, Kumasi, Ghana, West Africa Kwame Nkrumah University of Science and Technology, Kumasi, Ghana, West Africa
Nallely Mora Department of Public Health Sciences, Loyola University Chicago, Maywood, Illinois, United States of America
Guo-liang Xia Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
David S. Campo Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Michael A. Purdy Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Zoya E. Dimitrova Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Dorcas O. Owusu Komfo Anokye Teaching Hospital, Kumasi, Ghana, West Africa
Lili T. Punkova Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Pavel Skums Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Shirley Owusu-Ofori Komfo Anokye Teaching Hospital, Kumasi, Ghana, West Africa
Fred Stephen Sarfo Komfo Anokye Teaching Hospital, Kumasi, Ghana, West Africa Kwame Nkrumah University of Science and Technology, Kumasi, Ghana, West Africa
Gilberto Vaughan Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Hajung Roh Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Ohene K. Opare-Sem Komfo Anokye Teaching Hospital, Kumasi, Ghana, West Africa
Richard S. Cooper Department of Public Health Sciences, Loyola University Chicago, Maywood, Illinois, United States of America
Yury E. Khudyakov Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America

Collapse

Zhou B, Dong H, He Y, Sun J, Jin W, Xie Q, Fan R, Wang M, Li R, Chen Y, Xie S, Shen Y, Huang X, Wang S, Lu F, Jia J, Zhuang H, Locarnini S, Zhao GP, Jin L, Hou J. Composition and Interactions of Hepatitis B Virus Quasispecies Defined the Virological Response During Telbivudine Therapy. Sci Rep 2015;5:17123. [PMID: 26599443 PMCID: PMC4657086 DOI: 10.1038/srep17123] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 10/26/2015] [Indexed: 01/08/2023] Open

Affiliation(s)

Bin Zhou State Key Laboratory of Organ Failure Research, Guangdong Provincial Key Laboratory of Viral Hepatitis Research, Department of Infectious Diseases, Nanfang Hospital, Southern Medical University, Guangzhou, China
Hui Dong Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, China
Yungang He CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
Jian Sun State Key Laboratory of Organ Failure Research, Guangdong Provincial Key Laboratory of Viral Hepatitis Research, Department of Infectious Diseases, Nanfang Hospital, Southern Medical University, Guangzhou, China
Weirong Jin Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, China.,Shanghai Shenyou Biotechnology Co., Ltd., Shanghai, China
Qing Xie Department of Infectious Diseases, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
Rong Fan State Key Laboratory of Organ Failure Research, Guangdong Provincial Key Laboratory of Viral Hepatitis Research, Department of Infectious Diseases, Nanfang Hospital, Southern Medical University, Guangzhou, China
Minxian Wang CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
Ran Li CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
Yangyi Chen Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, China
Shaoqing Xie Shanghai Shenyou Biotechnology Co., Ltd., Shanghai, China
Yan Shen Shanghai Shenyou Biotechnology Co., Ltd., Shanghai, China
Xin Huang CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
Shengyue Wang Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, China
Fengming Lu Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
Jidong Jia Liver Research Center, Beijing Friendship Hospital, Capital Medical University, Beijing, China
Hui Zhuang Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
Stephen Locarnini Victorian Infectious Diseases Reference Laboratory, North Melbourne, Victoria, Australia
Guo-Ping Zhao Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, China.,CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Department of Microbiology and Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China.,State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences; Key Laboratory of Medical Molecular Virology affiliated to the Ministries of Education and Health, Shanghai Medical College and Department of Microbiology, School of Life Sciences; Fudan University, Shanghai, China
Li Jin CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences; Key Laboratory of Medical Molecular Virology affiliated to the Ministries of Education and Health, Shanghai Medical College and Department of Microbiology, School of Life Sciences; Fudan University, Shanghai, China
Jinlin Hou State Key Laboratory of Organ Failure Research, Guangdong Provincial Key Laboratory of Viral Hepatitis Research, Department of Infectious Diseases, Nanfang Hospital, Southern Medical University, Guangzhou, China.,Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang University, Hangzhou, China

Collapse

Jones LR, Sede M, Manrique JM, Quarleri J. Virus evolution during chronic hepatitis B virus infection as revealed by ultradeep sequencing data. J Gen Virol 2015;97:435-444. [PMID: 26581478 DOI: 10.1099/jgv.0.000344] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Campo DS, Xia GL, Dimitrova Z, Lin Y, Forbi JC, Ganova-Raeva L, Punkova L, Ramachandran S, Thai H, Skums P, Sims S, Rytsareva I, Vaughan G, Roh HJ, Purdy MA, Sue A, Khudyakov Y. Accurate Genetic Detection of Hepatitis C Virus Transmissions in Outbreak Settings. J Infect Dis 2015;213:957-65. [PMID: 26582955 DOI: 10.1093/infdis/jiv542] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2015] [Accepted: 10/08/2015] [Indexed: 12/18/2022] Open

Affiliation(s)

David S Campo Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Guo-Liang Xia Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Zoya Dimitrova Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Yulin Lin Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Joseph C Forbi Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Lilia Ganova-Raeva Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Lili Punkova Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Sumathi Ramachandran Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Hong Thai Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Pavel Skums Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Seth Sims Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Inna Rytsareva Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Gilberto Vaughan Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Ha-Jung Roh Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Michael A Purdy Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Amanda Sue Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
Yury Khudyakov Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia

Collapse

Thangam M, Gopal RK. CRCDA--Comprehensive resources for cancer NGS data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015;2015:bav092. [PMID: 26450948 PMCID: PMC4597977 DOI: 10.1093/database/bav092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/31/2015] [Indexed: 12/24/2022]

Abstract

Next generation sequencing (NGS) innovations put a compelling landmark in life science and changed the direction of research in clinical oncology with its productivity to diagnose and treat cancer. The aim of our portal comprehensive resources for cancer NGS data analysis (CRCDA) is to provide a collection of different NGS tools and pipelines under diverse classes with cancer pathways and databases and furthermore, literature information from PubMed. The literature data was constrained to 18 most common cancer types such as breast cancer, colon cancer and other cancers that exhibit in worldwide population. NGS-cancer tools for the convenience have been categorized into cancer genomics, cancer transcriptomics, cancer epigenomics, quality control and visualization. Pipelines for variant detection, quality control and data analysis were listed to provide out-of-the box solution for NGS data analysis, which may help researchers to overcome challenges in selecting and configuring individual tools for analysing exome, whole genome and transcriptome data. An extensive search page was developed that can be queried by using (i) type of data [literature, gene data and sequence read archive (SRA) data] and (ii) type of cancer (selected based on global incidence and accessibility of data). For each category of analysis, variety of tools are available and the biggest challenge is in searching and using the right tool for the right application. The objective of the work is collecting tools in each category available at various places and arranging the tools and other data in a simple and user-friendly manner for biologists and oncologists to find information easier. To the best of our knowledge, we have collected and presented a comprehensive package of most of the resources available in cancer for NGS data analysis. Given these factors, we believe that this website will be an useful resource to the NGS research community working on cancer.

Database URL: http://bioinfo.au-kbc.org.in/ngs/ngshome.html.

Collapse

Cacho A, Smirnova E, Huzurbazar S, Cui X. A Comparison of Base-calling Algorithms for Illumina Sequencing Technology. Brief Bioinform 2015;17:786-95. [PMID: 26443614 DOI: 10.1093/bib/bbv088] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Indexed: 11/14/2022] Open

Liu Y, Chiaromonte F, Ross H, Malhotra R, Elleder D, Poss M. Error correction and statistical analyses for intra-host comparisons of feline immunodeficiency virus diversity from high-throughput sequencing data. BMC Bioinformatics 2015;16:202. [PMID: 26123018 PMCID: PMC4486422 DOI: 10.1186/s12859-015-0607-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 04/29/2015] [Indexed: 11/16/2022] Open

Abstract

Background

Infection with feline immunodeficiency virus (FIV) causes an immunosuppressive disease whose consequences are less severe if cats are co-infected with an attenuated FIV strain (PLV). We use virus diversity measurements, which reflect replication ability and the virus response to various conditions, to test whether diversity of virulent FIV in lymphoid tissues is altered in the presence of PLV. Our data consisted of the 3′ half of the FIV genome from three tissues of animals infected with FIV alone, or with FIV and PLV, sequenced by 454 technology.

Results

Since rare variants dominate virus populations, we had to carefully distinguish sequence variation from errors due to experimental protocols and sequencing. We considered an exponential-normal convolution model used for background correction of microarray data, and modified it to formulate an error correction approach for minor allele frequencies derived from high-throughput sequencing. Similar to accounting for over-dispersion in counts, this accounts for error-inflated variability in frequencies – and quite effectively reproduces empirically observed distributions. After obtaining error-corrected minor allele frequencies, we applied ANalysis Of VAriance (ANOVA) based on a linear mixed model and found that conserved sites and transition frequencies in FIV genes differ among tissues of dual and single infected cats. Furthermore, analysis of minor allele frequencies at individual FIV genome sites revealed 242 sites significantly affected by infection status (dual vs. single) or infection status by tissue interaction. All together, our results demonstrated a decrease in FIV diversity in bone marrow in the presence of PLV. Importantly, these effects were weakened or undetectable when error correction was performed with other approaches (thresholding of minor allele frequencies; probabilistic clustering of reads). We also queried the data for cytidine deaminase activity on the viral genome, which causes an asymmetric increase in G to A substitutions, but found no evidence for this host defense strategy.

Conclusions

Our error correction approach for minor allele frequencies (more sensitive and computationally efficient than other algorithms) and our statistical treatment of variation (ANOVA) were critical for effective use of high-throughput sequencing data in understanding viral diversity. We found that co-infection with PLV shifts FIV diversity from bone marrow to lymph node and spleen.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0607-z) contains supplementary material, which is available to authorized users.

Collapse

Lin K, Smit S, Bonnema G, Sanchez-Perez G, de Ridder D. Making the difference: integrating structural variation detection tools. Brief Bioinform 2014;16:852-64. [PMID: 25504367 DOI: 10.1093/bib/bbu047] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Indexed: 01/01/2023] Open

Wood GR, Burroughs NJ, Evans DJ, Ryabov EV. Error correction and diversity analysis of population mixtures determined by NGS. PeerJ 2014;2:e645. [PMID: 25405074 PMCID: PMC4232844 DOI: 10.7717/peerj.645] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Accepted: 10/10/2014] [Indexed: 11/20/2022] Open

Skums P, Artyomenko A, Glebova O, Ramachandran S, Mandoiu I, Campo DS, Dimitrova Z, Zelikovsky A, Khudyakov Y. Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling. ACTA ACUST UNITED AC 2014;31:682-90. [PMID: 25359889 DOI: 10.1093/bioinformatics/btu726] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Affiliation(s)

Pavel Skums Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
Alexander Artyomenko Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
Olga Glebova Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
Sumathi Ramachandran Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
Ion Mandoiu Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
David S Campo Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
Zoya Dimitrova Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
Alex Zelikovsky Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
Yury Khudyakov Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA

Collapse

Analysis of the evolution and structure of a complex intrahost viral population in chronic hepatitis C virus mapped by ultradeep pyrosequencing. J Virol 2014;88:13709-21. [PMID: 25231312 DOI: 10.1128/jvi.01732-14] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Abstract

UNLABELLED

Hepatitis C virus (HCV) causes chronic infection in up to 50% to 80% of infected individuals. Hypervariable region 1 (HVR1) variability is frequently studied to gain an insight into the mechanisms of HCV adaptation during chronic infection, but the changes to and persistence of HCV subpopulations during intrahost evolution are poorly understood. In this study, we used ultradeep pyrosequencing (UDPS) to map the viral heterogeneity of a single patient over 9.6 years of chronic HCV genotype 4a infection. Informed error correction of the raw UDPS data was performed using a temporally matched clonal data set. The resultant data set reported the detection of low-frequency recombinants throughout the study period, implying that recombination is an active mechanism through which HCV can explore novel sequence space. The data indicate that polyvirus infection of hepatocytes has occurred but that the fitness quotients of recombinant daughter virions are too low for the daughter virions to compete against the parental genomes. The subpopulations of parental genomes contributing to the recombination events highlighted a dynamic virome where subpopulations of variants are in competition. In addition, we provide direct evidence that demonstrates the growth of subdominant populations to dominance in the absence of a detectable humoral response.

IMPORTANCE

Analysis of ultradeep pyrosequencing data sets derived from virus amplicons frequently relies on software tools that are not optimized for amplicon analysis, assume random incorporation of sequencing errors, and are focused on achieving higher specificity at the expense of sensitivity. Such analysis is further complicated by the presence of hypervariable regions. In this study, we made use of a temporally matched reference sequence data set to inform error correction algorithms. Using this methodology, we were able to (i) detect multiple instances of hepatitis C virus intrasubtype recombination at the E1/E2 junction (a phenomenon rarely reported in the literature) and (ii) interrogate the longitudinal quasispecies complexity of the virome. Parallel to the UDPS, isolation of IgG-bound virions was found to coincide with the collapse of specific viral subpopulations.

Collapse

Chabria SB, Gupta S, Kozal MJ. Deep Sequencing of HIV: Clinical and Research Applications. Annu Rev Genomics Hum Genet 2014;15:295-325. [DOI: 10.1146/annurev-genom-091212-153406] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Sede MM, Moretti FA, Laufer NL, Jones LR, Quarleri JF. HIV-1 tropism dynamics and phylogenetic analysis from longitudinal ultra-deep sequencing data of CCR5- and CXCR4-using variants. PLoS One 2014;9:e102857. [PMID: 25032817 PMCID: PMC4102574 DOI: 10.1371/journal.pone.0102857] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 06/25/2014] [Indexed: 11/25/2022] Open

Abstract

OBJECTIVE

Coreceptor switch from CCR5 to CXCR4 is associated with HIV disease progression. The molecular and evolutionary mechanisms underlying the CCR5 to CXCR4 switch are the focus of intense recent research. We studied the HIV-1 tropism dynamics in relation to coreceptor usage, the nature of quasispecies from ultra deep sequencing (UDPS) data and their phylogenetic relationships.

METHODS

Here, we characterized C2-V3-C3 sequences of HIV obtained from 19 patients followed up for 54 to 114 months using UDPS, with further genotyping and phylogenetic analysis for coreceptor usage. HIV quasispecies diversity and variability as well as HIV plasma viral load were measured longitudinally and their relationship with the HIV coreceptor usage was analyzed. The longitudinal UDPS data were submitted to phylogenetic analysis and sampling times and coreceptor usage were mapped onto the trees obtained.

RESULTS

Although a temporal viral genetic structuring was evident, the persistence of several viral lineages evolving independently along the infection was statistically supported, indicating a complex scenario for the evolution of viral quasispecies. HIV X4-using variants were present in most of our patients, exhibiting a dissimilar inter- and intra-patient predominance as the component of quasispecies even on antiretroviral therapy. The viral populations from some of the patients studied displayed evidences of the evolution of X4 variants through fitness valleys, whereas for other patients the data favored a gradual mode of emergence.

CONCLUSIONS

CXCR4 usage can emerge independently, in multiple lineages, along the course of HIV infection. The mode of emergence, i.e. gradual or through fitness valleys seems to depend on both virus and patient factors. Furthermore, our analyses suggest that, besides becoming dominant after population-level switches, minor proportions of X4 viruses might exist along the infection, perhaps even at early stages of it. The fate of these minor variants might depend on both viral and host factors.

Collapse

Campo DS, Dimitrova Z, Yamasaki L, Skums P, Lau DT, Vaughan G, Forbi JC, Teo CG, Khudyakov Y. Next-generation sequencing reveals large connected networks of intra-host HCV variants. BMC Genomics 2014;15 Suppl 5:S4. [PMID: 25081811 PMCID: PMC4120142 DOI: 10.1186/1471-2164-15-s5-s4] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Abstract

Background

Next-generation sequencing (NGS) allows for sampling numerous viral variants from infected patients. This provides a novel opportunity to represent and study the mutational landscape of Hepatitis C Virus (HCV) within a single host.

Results

Intra-host variants of the HCV E1/E2 region were extensively sampled from 58 chronically infected patients. After NGS error correction, the average number of reads and variants obtained from each sample were 3202 and 464, respectively. The distance between each pair of variants was calculated and networks were created for each patient, where each node is a variant and two nodes are connected by a link if the nucleotide distance between them is 1. The work focused on large components having > 5% of all reads, which in average account for 93.7% of all reads found in a patient.

The distance between any two variants calculated over the component correlated strongly with nucleotide distances (r = 0.9499; p = 0.0001), a better correlation than the one obtained with Neighbour-Joining trees (r = 0.7624; p = 0.0001). In each patient, components were well separated, with the average distance between (6.53%) being 10 times greater than within each component (0.68%). The ratio of nonsynonymous to synonymous changes was calculated and some patients (6.9%) showed a mixture of networks under strong negative and positive selection. All components were robust to in silico stochastic sampling; even after randomly removing 85% of all reads, the largest connected component in the new subsample still involved 82.4% of remaining nodes. In vitro sampling showed that 93.02% of components present in the original sample were also found in experimental replicas, with 81.6% of reads found in both. When syringe-sharing transmission events were simulated, 91.2% of all simulated transmission events seeded all components present in the source.

Conclusions

Most intra-host variants are organized into distinct single-mutation components that are: well separated from each other, represent genetic distances between viral variants, robust to sampling, reproducible and likely seeded during transmission events. Facilitated by NGS, large components offer a novel evolutionary framework for genetic analysis of intra-host viral populations and understanding transmission, immune escape and drug resistance.

Collapse

Giannuzzi G, Migliavacca E, Reymond A. Novel H3K4me3 marks are enriched at human- and chimpanzee-specific cytogenetic structures. Genome Res 2014;24:1455-68. [PMID: 24916972 PMCID: PMC4158755 DOI: 10.1101/gr.167742.113] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

Xiaobai Z, Xi C, Tian H, Williams AB, Wang H, He J, Zhen J, Chiarella J, Blake LA, Turenchalk G, Kozal MJ. Prevalence of WHO transmitted drug resistance mutations by deep sequencing in antiretroviral-naïve subjects in Hunan Province, China. PLoS One 2014;9:e98740. [PMID: 24896087 PMCID: PMC4045886 DOI: 10.1371/journal.pone.0098740] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 05/07/2014] [Indexed: 11/18/2022] Open

Abstract

BACKGROUND

There are few data on the prevalence of WHO transmitted drug resistance mutations (TDRs) that could affect treatment responses to first line antiretroviral therapy (ART) in Hunan Province, China.

OBJECTIVE

Determine the prevalence of WHO NRTI/NNRTI/PI TDRs in ART-naïve subjects in Hunan Province by deep sequencing.

METHODS

ART-naïve subjects diagnosed in Hunan between 2010-2011 were evaluated by deep sequencing for low-frequency HIV variants possessing WHO TDRs to 1% levels. Mutations were scored using the HIVdb.stanford.edu algorithm to infer drug susceptibility.

RESULTS

Deep sequencing was performed on samples from 90 ART-naïve subjects; 83.3% were AE subtype. All subjects had advanced disease (average CD4 count 134 cells/mm3). Overall 25.6%(23/90) of subjects had HIV with major WHO NRTI/NNRTI TDRs by deep sequencing at a variant frequency level ≥ 1%; 16.7%(15/90) had NRTI TDR and 12.2%(11/90) had a major NNRTI TDR. The majority of NRTI/NNRTI mutations were identified at variant levels <5%. Mutations were analyzed by HIVdb.stanford.edu and 7.8% of subjects had variants with high-level nevirapine resistance; 4.4% had high-level NRTI resistance. Deep sequencing identified 24(27.6%) subjects with variants possessing either a PI TDR or hivdb.stanford.edu PI mutation (algorithm value ≥ 15). 17(19.5%) had PI TDRs at levels >1%.

CONCLUSIONS

ART-naïve subjects from Hunan Province China infected predominantly with subtype AE frequently possessed HIV variants with WHO NRTI/NNRTI TDRs by deep sequencing that would affect the first line ART used in the region. Specific mutations conferring nevirapine high-level resistance were identified in 7.8% of subjects. The majority of TDRs detected were at variant levels <5% likely due to subjects having advanced chronic disease at the time of testing. PI TDRs were identified frequently, but were found in isolation and at low variant frequency. As PI/r use is infrequent in Hunan, the existence of PI mutations likely represent AE subtype natural polymorphism at low variant level frequency.

Collapse

Knief C. Analysis of plant microbe interactions in the era of next generation sequencing technologies. FRONTIERS IN PLANT SCIENCE 2014;5:216. [PMID: 24904612 PMCID: PMC4033234 DOI: 10.3389/fpls.2014.00216] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Accepted: 04/30/2014] [Indexed: 05/18/2023]

Töpfer A, Marschall T, Bull RA, Luciani F, Schönhuth A, Beerenwinkel N. Viral quasispecies assembly via maximal clique enumeration. PLoS Comput Biol 2014;10:e1003515. [PMID: 24675810 PMCID: PMC3967922 DOI: 10.1371/journal.pcbi.1003515] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 01/31/2014] [Indexed: 11/25/2022] Open

Forbi JC, Campo DS, Purdy MA, Dimitrova ZE, Skums P, Xia GL, Punkova LT, Ganova-Raeva LM, Vaughan G, Ben-Ayed Y, Switzer WM, Khudyakov YE. Intra-host diversity and evolution of hepatitis C virus endemic to Côte d'Ivoire. J Med Virol 2014;86:765-71. [PMID: 24519518 DOI: 10.1002/jmv.23897] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/14/2014] [Indexed: 12/12/2022]

Drug resistance of a viral population and its individual intrahost variants during the first 48 hours of therapy. Clin Pharmacol Ther 2014;95:627-35. [PMID: 24488144 PMCID: PMC4215939 DOI: 10.1038/clpt.2014.20] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 01/22/2014] [Indexed: 12/20/2022]

Glouzon JPS, Bolduc F, Wang S, Najmanovich RJ, Perreault JP. Deep-sequencing of the peach latent mosaic viroid reveals new aspects of population heterogeneity. PLoS One 2014;9:e87297. [PMID: 24498066 PMCID: PMC3907566 DOI: 10.1371/journal.pone.0087297] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 12/24/2013] [Indexed: 01/04/2023] Open

Abstract

Viroids are small circular single-stranded infectious RNAs characterized by a relatively high mutation level. Knowledge of their sequence heterogeneity remains largely elusive and previous studies, using Sanger sequencing, were based on a limited number of sequences. In an attempt to address sequence heterogeneity from a population dynamics perspective, a GF305-indicator peach tree was infected with a single variant of the Avsunviroidae family member Peach latent mosaic viroid (PLMVd). Six months post-inoculation, full-length circular conformers of PLMVd were isolated and deep-sequenced. We devised an original approach to the bioinformatics refinement of our sequence libraries involving important phenotypic data, based on the systematic analysis of hammerhead self-cleavage activity. Two distinct libraries yielded a total of 3,939 different PLMVd variants. Sequence variants exhibiting up to ∼17% of mutations relative to the inoculated viroid were retrieved, clearly illustrating the high level of divergence dynamics within a unique population. While we initially assumed that most positions of the viroid sequence would mutate, we were surprised to discover that ∼50% of positions remained perfectly conserved, including several small stretches as well as a small motif reminiscent of a GNRA tetraloop which are the result of various selective pressures. Using a hierarchical clustering algorithm, the different variants harvested were subdivided into 7 clusters. We found that most sequences contained an average of 4.6 to 6.4 mutations compared to the variant used to initially inoculate the plant. Interestingly, it was possible to reconstitute and compare the sequence evolution of each of these clusters. In doing so, we identified several key mutations. This study provides a reliable pipeline for the treatment of viroid deep-sequencing. It also sheds new light on the extent of sequence variation that a viroid population can sustain, and which may give rise to a quasispecies.

Collapse

McElroy K, Thomas T, Luciani F. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions. MICROBIAL INFORMATICS AND EXPERIMENTATION 2014;4:1. [PMID: 24428920 PMCID: PMC3902414 DOI: 10.1186/2042-5783-4-1] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 01/07/2014] [Indexed: 12/15/2022]

Prosperi MCF, Yin L, Nolan DJ, Lowe AD, Goodenow MM, Salemi M. Empirical validation of viral quasispecies assembly algorithms: state-of-the-art and challenges. Sci Rep 2013;3:2837. [PMID: 24089188 PMCID: PMC3789152 DOI: 10.1038/srep02837] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Accepted: 09/13/2013] [Indexed: 11/22/2022] Open

McElroy K, Zagordi O, Bull R, Luciani F, Beerenwinkel N. Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias. BMC Genomics 2013;14:501. [PMID: 23879730 PMCID: PMC3848937 DOI: 10.1186/1471-2164-14-501] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Accepted: 07/15/2013] [Indexed: 11/10/2022] Open

Skums P, Mancuso N, Artyomenko A, Tork B, Mandoiu I, Khudyakov Y, Zelikovsky A. Reconstruction of viral population structure from next-generation sequencing data using multicommodity flows. BMC Bioinformatics 2013;14 Suppl 9:S2. [PMID: 23902469 PMCID: PMC3698000 DOI: 10.1186/1471-2105-14-s9-s2] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Highly mutable RNA viruses exist in infected hosts as heterogeneous populations of genetically close variants known as quasispecies. Next-generation sequencing (NGS) allows for analysing a large number of viral sequences from infected patients, presenting a novel opportunity for studying the structure of a viral population and understanding virus evolution, drug resistance and immune escape. Accurate reconstruction of genetic composition of intra-host viral populations involves assembling the NGS short reads into whole-genome sequences and estimating frequencies of individual viral variants. Although a few approaches were developed for this task, accurate reconstruction of quasispecies populations remains greatly unresolved.

RESULTS

Two new methods, AmpMCF and ShotMCF, for reconstruction of the whole-genome intra-host viral variants and estimation of their frequencies were developed, based on Multicommodity Flows (MCFs). AmpMCF was designed for NGS reads obtained from individual PCR amplicons and ShotMCF for NGS shotgun reads. While AmpMCF, based on covering formulation, identifies a minimal set of quasispecies explaining all observed reads, ShotMCS, based on packing formulation, engages the maximal number of reads to generate the most probable set of quasispecies. Both methods were evaluated on simulated data in comparison to Maximum Bandwidth and ViSpA, previously developed state-of-the-art algorithms for estimating quasispecies spectra from the NGS amplicon and shotgun reads, respectively. Both algorithms were accurate in estimation of quasispecies frequencies, especially from large datasets.

CONCLUSIONS

The problem of viral population reconstruction from amplicon or shotgun NGS reads was solved using the MCF formulation. The two methods, ShotMCF and AmpMCF, developed here afford accurate reconstruction of the structure of intra-host viral population from NGS reads. The implementations of the algorithms are available at http://alan.cs.gsu.edu/vira.html (AmpMCF) and http://alan.cs.gsu.edu/NGS/?q=content/shotmcf (ShotMCF).

Collapse

Niklas N, Pröll J, Danzer M, Stabentheiner S, Hofer K, Gabriel C. Routine performance and errors of 454 HLA exon sequencing in diagnostics. BMC Bioinformatics 2013;14:176. [PMID: 23731822 PMCID: PMC3679934 DOI: 10.1186/1471-2105-14-176] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 05/30/2013] [Indexed: 11/25/2022] Open

Abstract

Background

Next-generation sequencing (NGS) has changed genomics significantly. More and more applications strive for sequencing with different platforms. Now, in 2012, after a decade of development and evolution, NGS has been accepted for a variety of research fields. Determination of sequencing errors is essential in order to follow next-generation sequencing beyond research use only. This study describes the overall 454 system performance of using multiple GS Junior runs with an in-house established and validated diagnostic assay for human leukocyte antigen (HLA) exon sequencing. Based on this data, we extracted, evaluated and characterized errors and variants of 60 HLA loci per run with respect to their adjacencies.

Results

We determined an overall error rate of 0.18% in a total of 118,484,408 bases. 31.3% of all reads analyzed (n=349,503) contain one or more errors. The largest group are deletions that account for 50% of the errors. Incorrect bases are not distributed equally along sequences and tend to be more frequent at sequence ends. Certain sequence positions in the middle or at the beginning of the read accumulate errors. Typically, the corresponding quality score at the actual error position is lower than the adjacent scores.

Conclusions

Here we present the first error assessment in a human next-generation sequencing diagnostics assay in an amplicon sequencing approach. Improvements of sequence quality and error rate that have been made over the years are evident and it is shown that both have now reached a level where diagnostic applications become feasible. Our presented data are better than previously published error rates and we can confirm and quantify the often described relation of homopolymers and errors. Nevertheless, a certain depth of coverage is needed, in particular with challenging areas of the sequencing target. Furthermore, the usage of error correcting tools is not essential but might contribute towards the capacity and efficiency of a sequencing run.

Collapse