1
|
Rosenberger G, Li W, Turunen M, He J, Subramaniam PS, Pampou S, Griffin AT, Karan C, Kerwin P, Murray D, Honig B, Liu Y, Califano A. Network-based elucidation of colon cancer drug resistance by phosphoproteomic time-series analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.15.528736. [PMID: 36824919 PMCID: PMC9949144 DOI: 10.1101/2023.02.15.528736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Aberrant signaling pathway activity is a hallmark of tumorigenesis and progression, which has guided targeted inhibitor design for over 30 years. Yet, adaptive resistance mechanisms, induced by rapid, context-specific signaling network rewiring, continue to challenge therapeutic efficacy. By leveraging progress in proteomic technologies and network-based methodologies, over the past decade, we developed VESPA-an algorithm designed to elucidate mechanisms of cell response and adaptation to drug perturbations-and used it to analyze 7-point phosphoproteomic time series from colorectal cancer cells treated with clinically-relevant inhibitors and control media. Interrogation of tumor-specific enzyme/substrate interactions accurately inferred kinase and phosphatase activity, based on their inferred substrate phosphorylation state, effectively accounting for signal cross-talk and sparse phosphoproteome coverage. The analysis elucidated time-dependent signaling pathway response to each drug perturbation and, more importantly, cell adaptive response and rewiring that was experimentally confirmed by CRISPRko assays, suggesting broad applicability to cancer and other diseases.
Collapse
Affiliation(s)
- George Rosenberger
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Wenxue Li
- Yale Cancer Biology Institute, Yale University, West Haven, CT, USA
| | - Mikko Turunen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Jing He
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Present address: Regeneron Genetics Center, Tarrytown, NY, USA
| | - Prem S Subramaniam
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Sergey Pampou
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Aaron T Griffin
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Medical Scientist Training Program, Columbia University Irving Medical Center, New York, NY, USA
| | - Charles Karan
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Patrick Kerwin
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Yansheng Liu
- Yale Cancer Biology Institute, Yale University, West Haven, CT, USA
- Department of Pharmacology, Yale University School of Medicine, New Haven, CT, USA
| | - Andrea Califano
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- J.P. Sulzberger Columbia Genome Center, Columbia University Irving Medical Center, New York, NY, USA
- Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
- Department of Biochemistry & Molecular Biophysics, Columbia University Irving Medical Center, New York, NY, USA
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| |
Collapse
|
2
|
K-Mer Spectrum-Based Error Correction Algorithm for Next-Generation Sequencing Data. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8077664. [PMID: 35875730 PMCID: PMC9303089 DOI: 10.1155/2022/8077664] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 06/13/2022] [Indexed: 11/26/2022]
Abstract
In the mid-1970s, the first-generation sequencing technique (Sanger) was created. It used Advanced BioSystems sequencing devices and Beckman's GeXP genetic testing technology. The second-generation sequencing (2GS) technique arrived just several years after the first human genome was published in 2003. 2GS devices are very quicker than Sanger sequencing equipment, with considerably cheaper manufacturing costs and far higher throughput in the form of short reads. The third-generation sequencing (3GS) method, initially introduced in 2005, offers further reduced manufacturing costs and higher throughput. Even though sequencing technique has result generations, it is error-prone due to a large number of reads. The study of this massive amount of data will aid in the decoding of life secrets, the detection of infections, the development of improved crops, and the improvement of life quality, among other things. This is a challenging task, which is complicated not just by a large number of reads and by the occurrence of sequencing mistakes. As a result, error correction is a crucial duty in data processing; it entails identifying and correcting read errors. Various k-spectrum-based error correction algorithms' performance can be influenced by a variety of characteristics like coverage depth, read length, and genome size, as demonstrated in this work. As a result, time and effort must be put into selecting acceptable approaches for error correction of certain NGS data.
Collapse
|
3
|
Briscoe L, Balliu B, Sankararaman S, Halperin E, Garud NR. Evaluating supervised and unsupervised background noise correction in human gut microbiome data. PLoS Comput Biol 2022; 18:e1009838. [PMID: 35130266 PMCID: PMC8853548 DOI: 10.1371/journal.pcbi.1009838] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 02/17/2022] [Accepted: 01/15/2022] [Indexed: 12/13/2022] Open
Abstract
The ability to predict human phenotypes and identify biomarkers of disease from metagenomic data is crucial for the development of therapeutics for microbiome-associated diseases. However, metagenomic data is commonly affected by technical variables unrelated to the phenotype of interest, such as sequencing protocol, which can make it difficult to predict phenotype and find biomarkers of disease. Supervised methods to correct for background noise, originally designed for gene expression and RNA-seq data, are commonly applied to microbiome data but may be limited because they cannot account for unmeasured sources of variation. Unsupervised approaches address this issue, but current methods are limited because they are ill-equipped to deal with the unique aspects of microbiome data, which is compositional, highly skewed, and sparse. We perform a comparative analysis of the ability of different denoising transformations in combination with supervised correction methods as well as an unsupervised principal component correction approach that is presently used in other domains but has not been applied to microbiome data to date. We find that the unsupervised principal component correction approach has comparable ability in reducing false discovery of biomarkers as the supervised approaches, with the added benefit of not needing to know the sources of variation apriori. However, in prediction tasks, it appears to only improve prediction when technical variables contribute to the majority of variance in the data. As new and larger metagenomic datasets become increasingly available, background noise correction will become essential for generating reproducible microbiome analyses. The human gut microbiome is known to play a major role in health and is associated with many diseases including colorectal cancer, obesity, and diabetes. The prediction of host phenotypes and identification of biomarkers of disease is essential for harnessing the therapeutic potential of the microbiome. However, many metagenomic datasets are affected by technical variables that introduce unwanted variation that can confound the ability to predict phenotypes and identify biomarkers. Currently, supervised methods originally designed for gene expression and RNA-seq data are commonly applied to microbiome data for correction of background noise, but they are limited in that they cannot correct for unmeasured sources of variation. Unsupervised approaches address this issue, but current methods are limited because they are ill-equipped to deal with the unique aspects of microbiome data, which is compositional, highly skewed, and sparse. We perform a comparative analysis of the ability of different denoising transformations in combination with supervised correction methods as well as an unsupervised principal component correction approach and find that all correction approaches reduce false positives for biomarker discovery. In the task of predicting phenotypes, different approaches have varying success where the unsupervised correction can improve prediction when technical variables contribute to the majority of variance in the data. As new and larger metagenomic datasets become increasingly available, background noise correction will become essential for generating reproducible microbiome analyses.
Collapse
Affiliation(s)
- Leah Briscoe
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail: (LB); (EH); (NRG)
| | - Brunilda Balliu
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Sriram Sankararaman
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
| | - Eran Halperin
- Department of Computer Science, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Anesthesiology and Perioperative Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
- Institute of Precision Health, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail: (LB); (EH); (NRG)
| | - Nandita R. Garud
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail: (LB); (EH); (NRG)
| |
Collapse
|
4
|
Knyazev S, Tsyvina V, Shankar A, Melnyk A, Artyomenko A, Malygina T, Porozov YB, Campbell EM, Switzer WM, Skums P, Mangul S, Zelikovsky A. Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction. Nucleic Acids Res 2021; 49:e102. [PMID: 34214168 PMCID: PMC8464054 DOI: 10.1093/nar/gkab576] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 05/25/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open
Abstract
Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient’s treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.
Collapse
Affiliation(s)
- Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA.,Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, USA
| | - Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Anupama Shankar
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Andrew Melnyk
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | | | - Tatiana Malygina
- International Scientific and Research Institute of Bioengineering, ITMO University, St. Petersburg 197101, Russia
| | - Yuri B Porozov
- World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia.,Department of Computational Biology, Sirius University of Science and Technology, Sochi 354340, Russia
| | - Ellsworth M Campbell
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - William M Switzer
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| |
Collapse
|
5
|
Akiyama MJ, Lipsey D, Ganova-Raeva L, Punkova LT, Agyemang L, Sue A, Ramachandran S, Khudyakov Y, Litwin AH. A Phylogenetic Analysis of Hepatitis C Virus Transmission, Relapse, and Reinfection Among People Who Inject Drugs Receiving Opioid Agonist Therapy. J Infect Dis 2021; 222:488-498. [PMID: 32150621 DOI: 10.1093/infdis/jiaa100] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 03/03/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Understanding hepatitis C virus (HCV) transmission among people who inject drugs (PWID) is essential for HCV elimination. We aimed to differentiate reinfections from treatment failures and to identify transmission linkages and associated factors in a cohort of PWID receiving opioid agonist therapy (OAT). METHODS We analyzed baseline and follow-up specimens from 150 PWID from 3 OAT clinics in the Bronx, New York. Next-generation sequencing data from the hypervariable region 1 of HCV were analyzed using Global Hepatitis Outbreak and Surveillance Technology. RESULTS There were 3 transmission linkages between study participants. Sustained virologic response (SVR) was not achieved in 9 participants: 7 had follow-up specimens with similar sequences to baseline, and 2 died. In 4 additional participants, SVR was achieved but the participants were viremic at later follow-up: 2 were reinfected with different strains, 1 had a late treatment failure, and 1 was transiently viremic 17 months after treatment. All transmission linkages were from the same OAT clinic and involved spousal or common-law partnerships. CONCLUSION This study highlights the use of next-generation sequencing as an important tool for identifying viral transmission and to help distinguish relapse and reinfection among PWID. Results reinforce the need for harm reduction interventions among couples and those who report ongoing risk factors after SVR.
Collapse
Affiliation(s)
| | - Daniel Lipsey
- Montefiore Medical Center/Albert Einstein College of Medicine
| | | | - Lili T Punkova
- Centers for Disease Control, Division of Viral Hepatitis
| | - Linda Agyemang
- Montefiore Medical Center/Albert Einstein College of Medicine
| | - Amanda Sue
- Centers for Disease Control, Division of Viral Hepatitis
| | | | - Yury Khudyakov
- Centers for Disease Control, Division of Viral Hepatitis
| | - Alain H Litwin
- Prisma Health, University of South Carolina School of Medicine, Clemson University School of Health Research
| |
Collapse
|
6
|
Knyazev S, Hughes L, Skums P, Zelikovsky A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief Bioinform 2021; 22:96-108. [PMID: 32568371 PMCID: PMC8485218 DOI: 10.1093/bib/bbaa101] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/24/2020] [Accepted: 05/04/2020] [Indexed: 01/04/2023] Open
Abstract
The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.
Collapse
|
7
|
Icer Baykal PB, Lara J, Khudyakov Y, Zelikovsky A, Skums P. Quantitative differences between intra-host HCV populations from persons with recently established and persistent infections. Virus Evol 2020; 7:veaa103. [PMID: 33505710 PMCID: PMC7816669 DOI: 10.1093/ve/veaa103] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Detection of incident hepatitis C virus (HCV) infections is crucial for identification of outbreaks and development of public health interventions. However, there is no single diagnostic assay for distinguishing recent and persistent HCV infections. HCV exists in each infected host as a heterogeneous population of genomic variants, whose evolutionary dynamics remain incompletely understood. Genetic analysis of such viral populations can be applied to the detection of incident HCV infections and used to understand intra-host viral evolution. We studied intra-host HCV populations sampled using next-generation sequencing from 98 recently and 256 persistently infected individuals. Genetic structure of the populations was evaluated using 245,878 viral sequences from these individuals and a set of selected features measuring their diversity, topological structure, complexity, strength of selection, epistasis, evolutionary dynamics, and physico-chemical properties. Distributions of the viral population features differ significantly between recent and persistent infections. A general increase in viral genetic diversity from recent to persistent infections is frequently accompanied by decline in genomic complexity and increase in structuredness of the HCV population, likely reflecting a high level of intra-host adaptation at later stages of infection. Using these findings, we developed a machine learning classifier for the infection staging, which yielded a detection accuracy of 95.22 per cent, thus providing a higher accuracy than other genomic-based models. The detection of a strong association between several HCV genetic factors and stages of infection suggests that intra-host HCV population develops in a complex but regular and predictable manner in the course of infection. The proposed models may serve as a foundation of cyber-molecular assays for staging infection, which could potentially complement and/or substitute standard laboratory assays.
Collapse
Affiliation(s)
- Pelin B Icer Baykal
- Department of Computer Science, Georgia State University, 25 Park Place, Atlanta, GA 30302, USA
| | - James Lara
- Division of Viral Hepatitis, Centers for Disease Control and Prevention, 1600 Clifton Rd., Atlanta, GA 30329, USA
| | - Yury Khudyakov
- Division of Viral Hepatitis, Centers for Disease Control and Prevention, 1600 Clifton Rd., Atlanta, GA 30329, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 25 Park Place, Atlanta, GA 30302, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 25 Park Place, Atlanta, GA 30302, USA
| |
Collapse
|
8
|
Basodi S, Baykal PI, Zelikovsky A, Skums P, Pan Y. Analysis of heterogeneous genomic samples using image normalization and machine learning. BMC Genomics 2020; 21:405. [PMID: 33349236 PMCID: PMC7751093 DOI: 10.1186/s12864-020-6661-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures. RESULTS We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy. CONCLUSIONS Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.
Collapse
Affiliation(s)
- Sunitha Basodi
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA.
| | - Pelin Icer Baykal
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA.,The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 11991, Russia
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| | - Yi Pan
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| |
Collapse
|
9
|
Abstract
BACKGROUND Single-cell sequencing experiments use short DNA barcode 'tags' to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. RESULTS Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. CONCLUSION We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.
Collapse
Affiliation(s)
- Akshay Tambe
- Division of Biology and Biological Engineering, California Institute of Technology, 116 Kerckhoff Laboratory, Pasadena, CA 91125 USA
| | - Lior Pachter
- Departments of Biology and Computing & Mathematical Sciences, California Institute of Technology, 116 Kerckhoff Laboratory, Pasadena, CA 91125 USA
| |
Collapse
|
10
|
Ramachandran S, Thai H, Forbi JC, Galang RR, Dimitrova Z, Xia GL, Lin Y, Punkova LT, Pontones PR, Gentry J, Blosser SJ, Lovchik J, Switzer WM, Teshale E, Peters P, Ward J, Khudyakov Y. A large HCV transmission network enabled a fast-growing HIV outbreak in rural Indiana, 2015. EBioMedicine 2018; 37:374-381. [PMID: 30448155 PMCID: PMC6284413 DOI: 10.1016/j.ebiom.2018.10.007] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 10/02/2018] [Indexed: 12/27/2022] Open
Abstract
Background A high prevalence (92.3%) of hepatitis C virus (HCV) co-infection among HIV patients identified during a large HIV outbreak associated with injection of oxymorphone in Indiana prompted genetic analysis of HCV strains. Methods Molecular epidemiological analysis of HCV-positive samples included genotyping, sampling intra-host HVR1 variants by next-generation sequencing (NGS) and constructing transmission networks using Global Hepatitis Outbreak and Surveillance Technology (GHOST). Findings Results from the 492 samples indicate predominance of HCV genotypes 1a (72.2%) and 3a (20.4%), and existence of 2 major endemic NS5B clusters involving 49.8% of the sequenced strains. Among 76 HIV co-infected patients, 60.5% segregated into 2 endemic clusters. NGS analyses of 281 cases identified 826,917 unique HVR1 sequences and 51 cases of mixed subtype/genotype infections. GHOST mapped 23 transmission clusters. One large cluster (n = 130) included 50 cases infected with ≥2 subtypes/genotypes and 43 cases co-infected with HIV. Rapid strain replacement and superinfection with different strains were found among 7 of 12 cases who were followed up. Interpretation GHOST enabled mapping of HCV transmission networks among persons who inject drugs (PWID). Findings of numerous transmission clusters, mixed-genotype infections and rapid succession of infections with different HCV strains indicate a high rate of HCV spread. Co-localization of HIV co-infected patients in the major HCV clusters suggests that HIV dissemination was enabled by existing HCV transmission networks that likely perpetuated HCV in the community for years. Identification of transmission networks is an important step to guiding efficient public health interventions for preventing and interrupting HCV and HIV transmission among PWID. Fund US Centers for Disease Control and Prevention, and US state and local public health departments.
Collapse
Affiliation(s)
- Sumathi Ramachandran
- Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA.
| | - Hong Thai
- Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
| | - Joseph C Forbi
- Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
| | - Romeo Regi Galang
- Epidemic Intelligence Service, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Zoya Dimitrova
- Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
| | - Guo-Liang Xia
- Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
| | - Yulin Lin
- Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
| | - Lili T Punkova
- Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
| | | | | | | | | | - William M Switzer
- Centers for Disease Control and Prevention, Division of HIV/AIDS Prevention, USA
| | - Eyasu Teshale
- Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
| | - Philip Peters
- Centers for Disease Control and Prevention, Division of HIV/AIDS Prevention, USA
| | - John Ward
- Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
| | - Yury Khudyakov
- Centers for Disease Control and Prevention, Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral hepatitis, USA
| | | |
Collapse
|
11
|
Saleem S, Ali A, Khubaib B, Akram M, Fatima Z, Idrees M. Genetic diversity of Hepatitis C Virus in Pakistan using Next Generation Sequencing. J Clin Virol 2018; 108:26-31. [PMID: 30219747 DOI: 10.1016/j.jcv.2018.09.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Revised: 08/14/2018] [Accepted: 09/07/2018] [Indexed: 01/06/2023]
Abstract
BACKGROUND In Pakistan, HCV disease is considered a major public health issue with about 10-17 million people suffering with this infection and rate is increasing every day without any hindrance. The currently available Pyrosequencing approach used to analyze complex viral genomes as it can determine minor variants. It is crucial to understand viral evolution and quasispecies diversity in complex viral strains. OBJECTIVES To assess genetic diversity in patients with HCV using Next Generation Sequencing (NGS) and compare nucleotide diversity of genotype 3a with respect to other genotypes. STUDY DESIGN Intra-host viral diversity of HCV was determined using NGS from 13 chronically HCV infected individuals. NGS of three different regions (E2 (HVR1), NS3 and NS5B) of HCV-3a allowed for a comprehensive analysis of the viral population. RESULT Phylogenetic analysis of different HCV genes revealed great variability within the Pakistani population. The average nucleotide diversity for HVR1, NS3 and NS5B was 0.029, 0.011 and 0.010 respectively. CONCLUSION Our findings clearly indicate that patient-2 greater quasispecies heterogeneity than other patients of same genotype-3a using phylogenetic and one step network analyses. Initially phylogenetic analysis of these three genes showed that genotype 3a samples have greater genetic diversity. However, no significant difference was determined when nucleotide variability of genotype 3a compared with other genotypes (1a, 1b, 2a & 4a).
Collapse
Affiliation(s)
- Sana Saleem
- Division of Molecular Virology and Molecular Centre of Excellence in Molecular Biology (CEMB), University of the Punjab, Lahore 87-West Canal Bank Road Thokar Niaz Baig, Lahore, Pakistan.
| | - Amjad Ali
- Molecular Virology laboratory, Centre for Applied Molecular Biology (CAMB) University of the Punjab, Lahore 87-West Canal Bank Road Thokar Niaz Baig, Lahore, Pakistan.
| | - Bushra Khubaib
- Division of Molecular Virology and Molecular Centre of Excellence in Molecular Biology (CEMB), University of the Punjab, Lahore 87-West Canal Bank Road Thokar Niaz Baig, Lahore, Pakistan; Department of Biotechnology, Lahore College for Women University, Lahore, Pakistan.
| | - Madiha Akram
- Division of Molecular Virology and Molecular Centre of Excellence in Molecular Biology (CEMB), University of the Punjab, Lahore 87-West Canal Bank Road Thokar Niaz Baig, Lahore, Pakistan; Department of Biotechnology, Lahore College for Women University, Lahore, Pakistan.
| | - Zareen Fatima
- Division of Molecular Virology and Molecular Centre of Excellence in Molecular Biology (CEMB), University of the Punjab, Lahore 87-West Canal Bank Road Thokar Niaz Baig, Lahore, Pakistan; Bioinformatics & Biotechnology, International Islamic University, Sector H-10, New Campus, Islamabad, Pakistan.
| | - Muhammad Idrees
- Division of Molecular Virology and Molecular Centre of Excellence in Molecular Biology (CEMB), University of the Punjab, Lahore 87-West Canal Bank Road Thokar Niaz Baig, Lahore, Pakistan; Vice Chancellor Hazara University Mansehra, Khyber Pakhtunkhwa, Pakistan.
| |
Collapse
|
12
|
Lara J, Teka MA, Sims S, Xia GL, Ramachandran S, Khudyakov Y. HCV adaptation to HIV coinfection. INFECTION GENETICS AND EVOLUTION 2018; 65:216-225. [PMID: 30075255 DOI: 10.1016/j.meegid.2018.07.039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 07/25/2018] [Accepted: 07/30/2018] [Indexed: 02/07/2023]
Abstract
Human immunodeficiency virus (HIV) infection is rising as a leading cause of morbidity and mortality among hepatitis C virus (HCV)-infected patients. Both viruses interact in co-infected hosts, which may affect their intra-host evolution, potentially leading to differing genetic composition of viral populations in co-infected (CIP) and mono-infected (MIP) patients. Here, we investigate genetic differences between intra-host variants of the HCV hypervariable region 1 (HVR1) sampled from CIP and MIP. Nucleotide (nt) sequences of intra-host HCV HVR1 variants (N = 28,622) obtained from CIP (N = 112) and MIP (n = 176) were represented using 148 physical-chemical (PhyChem) indexes of DNA nt dimers. Significant (p < .0001) differences in the means and frequency distributions of 7 PhyChem properties were found between HVR1 variants from both groups. Linear projection analysis of 29 PhyChem features extracted from such PhyChem properties showed that the CIP and MIP HVR1 variants have a distinct distribution in the modeled 2D-space, with only ~1.3% of PhyChem profiles (N = 6782), shared by all HVR1 variants, being found in both groups. Probabilistic neural network (PNN) and naïve Bayesian (NB) classifiers trained on the PhyChem features accurately classified HVR1 variants by the group in cross-validation experiments (AUROC ≥ 0.96). Similarly, both models showed a high accuracy (AUROC ≥ 0.95) when evaluated on a test dataset of HVR1 sequences obtained from 10 patients, data from whom were not used for model building. Both models performed at the expected lower accuracy on randomly labeled datasets in cross-validation experiments (AUROC = 0.50). The random-label trained PNN showed a similar drop in accuracy on the test dataset (AUROC = 0.48), indicating that the detected associations were unlikely due to random correlations. Marked differences in genetic composition of HCV HVR1 variants sampled from CIP and MIP suggest differing intra-host HCV evolution in the presence of HIV infection. PhyChem features identified here may be used for detection of HIV infection from intra-host HCV variants alone in co-infected patients, thus facilitating monitoring for HIV introduction to high-risk populations with high HCV prevalence.
Collapse
Affiliation(s)
- James Lara
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States.
| | - Mahder A Teka
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States
| | - Seth Sims
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States
| | - Guo-Liang Xia
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States
| | - Sumathi Ramachandran
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States
| | - Yury Khudyakov
- Centers for Disease Control, 1600 Clifton Road, Atlanta, GA 30333, United States
| |
Collapse
|
13
|
Hathaway NJ, Parobek CM, Juliano JJ, Bailey JA. SeekDeep: single-base resolution de novo clustering for amplicon deep sequencing. Nucleic Acids Res 2018; 46:e21. [PMID: 29202193 PMCID: PMC5829576 DOI: 10.1093/nar/gkx1201] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Revised: 11/16/2017] [Accepted: 11/20/2017] [Indexed: 01/08/2023] Open
Abstract
PCR amplicon deep sequencing continues to transform the investigation of genetic diversity in viral, bacterial, and eukaryotic populations. In eukaryotic populations such as Plasmodium falciparum infections, it is important to discriminate sequences differing by a single nucleotide polymorphism. In bacterial populations, single-base resolution can provide improved resolution towards species and strains. Here, we introduce the SeekDeep suite built around the qluster algorithm, which is capable of accurately building de novo clusters representing true, biological local haplotypes differing by just a single base. It outperforms current software, particularly at low frequencies and at low input read depths, whether resolving single-base differences or traditional OTUs. SeekDeep is open source and works with all major sequencing technologies, making it broadly useful in a wide variety of applications of amplicon deep sequencing to extract accurate and maximal biologic information.
Collapse
Affiliation(s)
- Nicholas J Hathaway
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Christian M Parobek
- Curriculum in Genetics and Molecular Biology, University of North Carolina School of Medicine, Chapel Hill, NC, USA
| | - Jonathan J Juliano
- Curriculum in Genetics and Molecular Biology, University of North Carolina School of Medicine, Chapel Hill, NC, USA
- Division of Infectious Diseases, Department of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Jeffrey A Bailey
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
- Division of Transfusion Medicine, Department of Medicine, University of Massachusetts Medical School, Worcester, MA, USA
| |
Collapse
|
14
|
Campo DS, Zhang J, Ramachandran S, Khudyakov Y. Transmissibility of intra-host hepatitis C virus variants. BMC Genomics 2017; 18:881. [PMID: 29244001 PMCID: PMC5731494 DOI: 10.1186/s12864-017-4267-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background Intra-host hepatitis C virus (HCV) populations are genetically heterogeneous and organized in subpopulations. With the exception of blood transfusions, transmission of HCV occurs via a small number of genetic variants, the effect of which is frequently described as a bottleneck. Stochasticity of transmission associated with the bottleneck is usually used to explain genetic differences among HCV populations identified in the source and recipient cases, which may be further exacerbated by intra-host HCV evolution and differential biological capacity of HCV variants to successfully establish a population in a new host. Results Transmissibility was formulated as a property that can be measured from experimental Ultra-Deep Sequencing (UDS) data. The UDS data were obtained from one large hepatitis C outbreak involving an epidemiologically defined source and 18 recipient cases. k-Step networks of HCV variants were constructed and used to identify a potential association between transmissibility and network centrality of individual HCV variants from the source. An additional dataset obtained from nine other HCV outbreaks with known directionality of transmission was used for validation. Transmissibility was not found to be dependent on high frequency of variants in the source, supporting the earlier observations of transmission of minority variants. Among all tested measures of centrality, the highest correlation of transmissibility was found with Hamming centrality (r = 0.720; p = 1.57 E-71). Correlation between genetic distances and differences in transmissibility among HCV variants from the source was found to be 0.3276 (Mantel Test, p = 9.99 E-5), indicating association between genetic proximity and transmissibility. A strong correlation ranging from 0.565–0.947 was observed between Hamming centrality and transmissibility in 7 of the 9 additional transmission clusters (p < 0.05). Conclusions Transmission is not an exclusively stochastic process. Transmissibility, as formally measured in this study, is associated with certain biological properties that also define location of variants in the genetic space occupied by the HCV strain from the source. The measure may also be applicable to other highly heterogeneous viruses. Besides improving accuracy of outbreak investigations, this finding helps with the understanding of molecular mechanisms contributing to establishment of chronic HCV infection.
Collapse
Affiliation(s)
- David S Campo
- Division of Viral Hepatitis, Molecular Epidemiology and Bioinformatics, Centers for Disease Control and Prevention, Atlanta, GA, USA.
| | - June Zhang
- Division of Viral Hepatitis, Molecular Epidemiology and Bioinformatics, Centers for Disease Control and Prevention, Atlanta, GA, USA.,Department of Electrical Engineering, University of Hawaii, Manoa, HI, USA
| | - Sumathi Ramachandran
- Division of Viral Hepatitis, Molecular Epidemiology and Bioinformatics, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Yury Khudyakov
- Division of Viral Hepatitis, Molecular Epidemiology and Bioinformatics, Centers for Disease Control and Prevention, Atlanta, GA, USA
| |
Collapse
|
15
|
Malhotra R, Jha M, Poss M, Acharya R. A random forest classifier for detecting rare variants in NGS data from viral populations. Comput Struct Biotechnol J 2017; 15:388-395. [PMID: 28819548 PMCID: PMC5548337 DOI: 10.1016/j.csbj.2017.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 07/01/2017] [Accepted: 07/03/2017] [Indexed: 11/28/2022] Open
Abstract
We propose a random forest classifier for detecting rare variants from sequencing errors in Next Generation Sequencing (NGS) data from viral populations. The method utilizes counts of varying length of k-mers from the reads of a viral population to train a Random forest classifier, called MultiRes, that classifies k-mers as erroneous or rare variants. Our algorithm is rooted in concepts from signal processing and uses a frame-based representation of k-mers. Frames are sets of non-orthogonal basis functions that were traditionally used in signal processing for noise removal. We define discrete spatial signals for genomes and sequenced reads, and show that k-mers of a given size constitute a frame. We evaluate MultiRes on simulated and real viral population datasets, which consist of many low frequency variants, and compare it to the error detection methods used in correction tools known in the literature. MultiRes has 4 to 500 times less false positives k-mer predictions compared to other methods, essential for accurate estimation of viral population diversity and their de-novo assembly. It has high recall of the true k-mers, comparable to other error correction methods. MultiRes also has greater than 95% recall for detecting single nucleotide polymorphisms (SNPs) and fewer false positive SNPs, while detecting higher number of rare variants compared to other variant calling methods for viral populations. The software is available freely from the GitHub link https://github.com/raunaq-m/MultiRes.
Collapse
Affiliation(s)
- Raunaq Malhotra
- The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Manjari Jha
- The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Mary Poss
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Raj Acharya
- School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
16
|
Rytsareva I, Campo DS, Zheng Y, Sims S, Thankachan SV, Tetik C, Chirag J, Chockalingam SP, Sue A, Aluru S, Khudyakov Y. Efficient detection of viral transmissions with Next-Generation Sequencing data. BMC Genomics 2017; 18:372. [PMID: 28589864 PMCID: PMC5461558 DOI: 10.1186/s12864-017-3732-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections associated with unsafe injection practices, drug diversion, and other exposures to blood are difficult to detect and investigate. Molecular analysis has been frequently used in the study of HCV outbreaks and transmission chains; helping identify a cluster of sequences as linked by transmission if their genetic distances are below a previously defined threshold. However, HCV exists as a population of numerous variants in each infected individual and it has been observed that minority variants in the source are often the ones responsible for transmission, a situation that precludes the use of a single sequence per individual because many such transmissions would be missed. The use of Next-Generation Sequencing immensely increases the sensitivity of transmission detection but brings a considerable computational challenge because all sequences need to be compared among all pairs of samples. METHODS We developed a three-step strategy that filters pairs of samples according to different criteria: (i) a k-mer bloom filter, (ii) a Levenhstein filter and (iii) a filter of identical sequences. We applied these three filters on a set of samples that cover the spectrum of genetic relationships among HCV cases, from being part of the same transmission cluster, to belonging to different subtypes. RESULTS Our three-step filtering strategy rapidly removes 85.1% of all the pairwise sample comparisons and 91.0% of all pairwise sequence comparisons, accurately establishing which pairs of HCV samples are below the relatedness threshold. CONCLUSIONS We present a fast and efficient three-step filtering strategy that removes most sequence comparisons and accurately establishes transmission links of any threshold-based method. This highly efficient workflow will allow a faster response and molecular detection capacity, improving the rate of detection of viral transmissions with molecular data.
Collapse
Affiliation(s)
- Inna Rytsareva
- Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - David S Campo
- Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA.
| | - Yueli Zheng
- Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Seth Sims
- Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Sharma V Thankachan
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA.,Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Cansu Tetik
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Jain Chirag
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Sriram P Chockalingam
- Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA, USA
| | - Amanda Sue
- Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Srinivas Aluru
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA.,Institute for Data Engineering and Science, Georgia Institute of Technology, Atlanta, GA, USA
| | - Yury Khudyakov
- Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| |
Collapse
|
17
|
Palmer BA, Dimitrova Z, Skums P, Crosbie O, Kenny-Walsh E, Fanning LJ. Ultradeep Pyrosequencing of Hepatitis C Virus to Define Evolutionary Phenotypes. Bio Protoc 2017; 7:e2284. [PMID: 34541061 DOI: 10.21769/bioprotoc.2284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Revised: 02/19/2017] [Accepted: 04/20/2017] [Indexed: 11/02/2022] Open
Abstract
Analysis of hypervariable regions (HVR) using pyrosequencing techniques is hampered by the ability of error correction algorithms to account for the heterogeneity of the variants present. Analysis of between-sample fluctuations to virome sub-populations, and detection of low frequency variants, are unreliable through the application of arbitrary frequency cut offs. Cumulatively this leads to an underestimation of genetic diversity. In the following technique we describe the analysis of Hepatitis C virus (HCV) HVR1 which includes the E1/E2 glycoprotein gene junction. This procedure describes the evolution of HCV in a treatment naïve environment, from 10 samples collected over 10 years, using ultradeep pyrosequencing (UDPS) performed on the Roche GS FLX titanium platform ( Palmer et al., 2014 ). Initial clonal analysis of serum samples was used to inform downstream error correction algorithms that allowed for a greater sequence depth to be reached. PCR amplification of this region has been tested for HCV genotypes 1, 2, 3 and 4.
Collapse
Affiliation(s)
- Brendan A Palmer
- Molecular Virology Diagnostic & Research Laboratory, Department of Medicine, University College Cork, Cork, Ireland
| | - Zoya Dimitrova
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, Georgia, USA
| | - Pavel Skums
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, Georgia, USA
| | - Orla Crosbie
- Department of Gastroenterology, Cork University Hospital, Cork, Ireland
| | | | - Liam J Fanning
- Molecular Virology Diagnostic & Research Laboratory, Department of Medicine, University College Cork, Cork, Ireland
| |
Collapse
|
18
|
Palmer BA, Fanning LJ. Synonymous Co-Variation across the E1/E2 Gene Junction of Hepatitis C Virus Defines Virion Fitness. PLoS One 2016; 11:e0167089. [PMID: 27880830 PMCID: PMC5120871 DOI: 10.1371/journal.pone.0167089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 11/07/2016] [Indexed: 11/18/2022] Open
Abstract
Hepatitis C virus is a positive-sense single-stranded RNA virus. The gene junction partitioning the viral glycoproteins E1 and E2 displays concurrent sequence evolution with the 3'-end of E1 highly conserved and the 5'-end of E2 highly heterogeneous. This gene junction is also believed to contain structured RNA elements, with a growing body of evidence suggesting that such structures can act as an additional level of viral replication and transcriptional control. We have previously used ultradeep pyrosequencing to analyze an amplicon library spanning the E1/E2 gene junction from a treatment naïve patient where samples were collected over 10 years of chronic HCV infection. During this timeframe maintenance of an in-frame insertion, recombination and humoral immune targeting of discrete virus sub-populations was reported. In the current study, we present evidence of epistatic evolution across the E1/E2 gene junction and observe the development of co-varying networks of codons set against a background of a complex virome with periodic shifts in population dominance. Overtime, the number of codons actively mutating decreases for all virus groupings. We identify strong synonymous co-variation between codon sites in a group of sequences harbouring a 3 bp in-frame insertion and propose that synonymous mutation acts to stabilize the RNA structural backbone.
Collapse
Affiliation(s)
- Brendan A. Palmer
- Molecular Virology Diagnostic & Research Laboratory, Department of Medicine, University College Cork, Cork, Ireland
- * E-mail: (LJF); (BAP)
| | - Liam J. Fanning
- Molecular Virology Diagnostic & Research Laboratory, Department of Medicine, University College Cork, Cork, Ireland
- * E-mail: (LJF); (BAP)
| |
Collapse
|
19
|
Hepatitis B virus resistance substitutions: long-term analysis by next-generation sequencing. Arch Virol 2016; 161:2885-91. [PMID: 27447462 DOI: 10.1007/s00705-016-2959-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Accepted: 06/28/2016] [Indexed: 01/07/2023]
Abstract
HBV phylogenetics and resistance-associated mutations (RAMs) were surveyed by next-generation sequencing of 21 longitudinal samples from seven patients entering antiviral therapy. The virus populations were dominated by a few abundant lineages that coexisted with substantial numbers of low-frequency variants. A few low-frequency RAMs were observed before treatment, but new ones emerged, and their frequencies increased during therapy. Together, these results support the idea that chronic HBV infection is dominated by a few virus lineages and that an accompanying plethora of diverse, low-frequency variants may function as a reservoir that potentially contribute to viral genetic plasticity, potentially affecting patient outcome.
Collapse
|
20
|
Campo DS, Roh HJ, Pearlman BL, Fierer DS, Ramachandran S, Vaughan G, Hinds A, Dimitrova Z, Skums P, Khudyakov Y. Increased Mitochondrial Genetic Diversity in Persons Infected With Hepatitis C Virus. Cell Mol Gastroenterol Hepatol 2016; 2:676-684. [PMID: 28174739 PMCID: PMC5042856 DOI: 10.1016/j.jcmgh.2016.05.012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 05/15/2016] [Indexed: 12/22/2022]
Abstract
BACKGROUND & AIMS The host genetic environment contributes significantly to the outcomes of hepatitis C virus (HCV) infection and therapy response, but little is known about any effects of HCV infection on the host beyond any changes related to adaptive immune responses. HCV persistence is associated strongly with mitochondrial dysfunction, with liver mitochondrial DNA (mtDNA) genetic diversity linked to disease progression. METHODS We evaluated the genetic diversity of 2 mtDNA genomic regions (hypervariable segments 1 and 2) obtained from sera of 116 persons using next-generation sequencing. RESULTS Results were as follows: (1) the average diversity among cases with seronegative acute HCV infection was 4.2 times higher than among uninfected controls; (2) the diversity level among cases with chronic HCV infection was 96.1 times higher than among uninfected controls; and (3) the diversity was 23.1 times higher among chronic than acute cases. In 2 patients who were followed up during combined interferon and ribavirin therapy, mtDNA nucleotide diversity decreased dramatically after the completion of therapy in both patients: by 100% in patient A after 54 days and by 70.51% in patient B after 76 days. CONCLUSIONS HCV infection strongly affects mtDNA genetic diversity. A rapid decrease in mtDNA genetic diversity observed after therapy-induced HCV clearance suggests that the effect is reversible, emphasizing dynamic genetic relationships between HCV and mitochondria. The level of mtDNA nucleotide diversity can be used to discriminate recent from past infections, which should facilitate the detection of recent transmission events and thus help identify modes of transmission.
Collapse
Key Words
- AUC, area under the curve
- Disease Biomarkers
- HCC, hepatocellular carcinoma
- HCV, hepatitis C virus
- HIV, human immunodeficiency virus
- HVS, hypervariable segment
- IFN, interferon
- NGS, next-generation sequencing
- Noninvasive
- PCR, polymerase chain reaction
- ROC, receiver operating characteristic
- mtDNA
- mtDNA, mitochondrial DNA
- pegIFN, peginterferon
Collapse
Affiliation(s)
- David S. Campo
- Laboratory of Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia,Correspondence Address correspondence to: David S. Campo, PhD, Centers for Disease Control and Prevention, 1600 Clifton Road, MS A33, Atlanta, Georgia 30329. fax: (404) 639-1563.Centers for Disease Control and Prevention1600 Clifton RoadMS A33AtlantaGeorgia 30329
| | - Ha-Jung Roh
- Laboratory of Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Brian L. Pearlman
- Center for Hepatitis C, Atlanta Medical Center, Atlanta, Georgia,Medical College of Georgia, Augusta, Georgia,Emory School of Medicine, Atlanta, Georgia
| | - Daniel S. Fierer
- Division of Infectious Diseases, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Sumathi Ramachandran
- Laboratory of Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Gilberto Vaughan
- Laboratory of Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Andrew Hinds
- Center for Hepatitis C, Atlanta Medical Center, Atlanta, Georgia
| | - Zoya Dimitrova
- Laboratory of Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Pavel Skums
- Laboratory of Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Yury Khudyakov
- Laboratory of Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| |
Collapse
|
21
|
Sede M, Parra M, Manrique JM, Laufer N, Jones LR, Quarleri J. Evolution of hepatitis C virus in HIV coinfected patients under antiretroviral therapy. INFECTION GENETICS AND EVOLUTION 2016; 43:186-96. [PMID: 27234841 DOI: 10.1016/j.meegid.2016.05.032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Revised: 05/13/2016] [Accepted: 05/23/2016] [Indexed: 02/07/2023]
Abstract
Five patients (P) were followed-up for an average of 7.73years after highly active antiretroviral therapy (HAART) initiation. Patients' immune and virological status were determined by periodical CD4+T-cell counts and HIV and HCV viral load. HCV populations were studied using longitudinal high throughput sequence data obtained in parallel by virological and immunological parameters. Two patients (P7, P28) with sub-optimal responses to HAART presented HCV viral loads significantly higher than those recorded for two patients (P1, P18) that achieved good responses to HAART. Interestingly, HCV populations from P7 and P28 displayed a stable phylogenetic structure, whereas HCV populations from P1 and P18showeda significant increase in their phylogenetic structure, followed by a decrease after achieving acceptable CD4+T-cell counts (>500 cell/μl). The fifth patient (P25) presented high HCV viral loads, preserved CD4+T-cell counts from baseline and all along the follow-up, and displayed a constant viral phylogenetic structure. These results strongly suggest that HAART-induced immune recovery induces a decrease in HCV viral load and an increase in the HCV population phylogenetic structure likely reflecting the virus diversification in response to the afresh immune response. The relatively low HCV viral load observed in the HAART responder patients suggests that once HCV is adapted it reaches a maximum number of haplotypes higher than that achieved during the initial stages of the immune response as inferred from the two recovering patients. Future studies using larger number of patients are needed to corroborate these hypotheses.
Collapse
Affiliation(s)
- Mariano Sede
- Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires, Paraguay 2155-Piso 11, C1121ABG Buenos Aires, Argentina
| | - Micaela Parra
- Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires, Paraguay 2155-Piso 11, C1121ABG Buenos Aires, Argentina
| | - Julieta M Manrique
- Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, 9 de Julio y Belgrano S/N, 9100 Trelew, Chubut, Argentina
| | - Natalia Laufer
- Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires, Paraguay 2155-Piso 11, C1121ABG Buenos Aires, Argentina
| | - Leandro R Jones
- Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, 9 de Julio y Belgrano S/N, 9100 Trelew, Chubut, Argentina.
| | - Jorge Quarleri
- Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, C1083ACA Buenos Aires, Argentina; Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires, Paraguay 2155-Piso 11, C1121ABG Buenos Aires, Argentina.
| |
Collapse
|
22
|
Beltman JB, Urbanus J, Velds A, van Rooij N, Rohr JC, Naik SH, Schumacher TN. Reproducibility of Illumina platform deep sequencing errors allows accurate determination of DNA barcodes in cells. BMC Bioinformatics 2016; 17:151. [PMID: 27038897 PMCID: PMC4818877 DOI: 10.1186/s12859-016-0999-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2015] [Accepted: 03/23/2016] [Indexed: 12/31/2022] Open
Abstract
Background Next generation sequencing (NGS) of amplified DNA is a powerful tool to describe genetic heterogeneity within cell populations that can both be used to investigate the clonal structure of cell populations and to perform genetic lineage tracing. For applications in which both abundant and rare sequences are biologically relevant, the relatively high error rate of NGS techniques complicates data analysis, as it is difficult to distinguish rare true sequences from spurious sequences that are generated by PCR or sequencing errors. This issue, for instance, applies to cellular barcoding strategies that aim to follow the amount and type of offspring of single cells, by supplying these with unique heritable DNA tags. Results Here, we use genetic barcoding data from the Illumina HiSeq platform to show that straightforward read threshold-based filtering of data is typically insufficient to filter out spurious barcodes. Importantly, we demonstrate that specific sequencing errors occur at an approximately constant rate across different samples that are sequenced in parallel. We exploit this observation by developing a novel approach to filter out spurious sequences. Conclusions Application of our new method demonstrates its value in the identification of true sequences amongst spurious sequences in biological data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0999-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Joost B Beltman
- Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands. .,Division of Toxicology, Leiden Academic Centre for Drug Research, Leiden University, 2333 CC, Leiden, The Netherlands.
| | - Jos Urbanus
- Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Arno Velds
- Genomics Core Facility, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Nienke van Rooij
- Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Jan C Rohr
- Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.,Center for Chronic Immunodeficiency (CCI), University Medical Center Freiburg and University of Freiburg, Freiburg, Germany
| | - Shalin H Naik
- Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.,Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC, 3052, Australia.,Department of Medical Biology, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Ton N Schumacher
- Division of Immunology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
| |
Collapse
|
23
|
Laehnemann D, Borkhardt A, McHardy AC. Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction. Brief Bioinform 2016; 17:154-79. [PMID: 26026159 PMCID: PMC4719071 DOI: 10.1093/bib/bbv029] [Citation(s) in RCA: 177] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Revised: 04/09/2015] [Indexed: 12/23/2022] Open
Abstract
Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.
Collapse
|
24
|
Network Analysis of the Chronic Hepatitis C Virome Defines Hypervariable Region 1 Evolutionary Phenotypes in the Context of Humoral Immune Responses. J Virol 2015; 90:3318-29. [PMID: 26719263 DOI: 10.1128/jvi.02995-15] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Accepted: 12/22/2015] [Indexed: 02/06/2023] Open
Abstract
UNLABELLED Hypervariable region 1 (HVR1) of hepatitis C virus (HCV) comprises the first 27 N-terminal amino acid residues of E2. It is classically seen as the most heterogeneous region of the HCV genome. In this study, we assessed HVR1 evolution by using ultradeep pyrosequencing for a cohort of treatment-naive, chronically infected patients over a short, 16-week period. Organization of the sequence set into connected components that represented single nucleotide substitution events revealed a network dominated by highly connected, centrally positioned master sequences. HVR1 phenotypes were observed to be under strong purifying (stationary) and strong positive (antigenic drift) selection pressures, which were coincident with advancing patient age and cirrhosis of the liver. It followed that stationary viromes were dominated by a single HVR1 variant surrounded by minor variants comprised from conservative single amino acid substitution events. We present evidence to suggest that neutralization antibody efficacy was diminished for stationary-virome HVR1 variants. Our results identify the HVR1 network structure during chronic infection as the preferential dominance of a single variant within a narrow sequence space. IMPORTANCE HCV infection is often asymptomatic, and chronic infection is generally well established in advance of initial diagnosis and subsequent treatment. HVR1 can undergo rapid sequence evolution during acute infection, and the variant pool is typically seen to diverge away from ancestral sequences as infection progresses from the acute to the chronic phase. In this report, we describe HVR1 viromes in chronically infected patients that are defined by a dominant epitope located centrally within a narrow variant pool. Our findings suggest that weakened humoral immune activity, as a consequence of persistent chronic infection, allows for the acquisition and maintenance of host-specific adaptive mutations at HVR1 that reflect virus fitness.
Collapse
|
25
|
Forbi JC, Layden JE, Phillips RO, Mora N, Xia GL, Campo DS, Purdy MA, Dimitrova ZE, Owusu DO, Punkova LT, Skums P, Owusu-Ofori S, Sarfo FS, Vaughan G, Roh H, Opare-Sem OK, Cooper RS, Khudyakov YE. Next-Generation Sequencing Reveals Frequent Opportunities for Exposure to Hepatitis C Virus in Ghana. PLoS One 2015; 10:e0145530. [PMID: 26683463 PMCID: PMC4684299 DOI: 10.1371/journal.pone.0145530] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 12/04/2015] [Indexed: 12/14/2022] Open
Abstract
Globally, hepatitis C Virus (HCV) infection is responsible for a large proportion of persons with liver disease, including cancer. The infection is highly prevalent in sub-Saharan Africa. West Africa was identified as a geographic origin of two HCV genotypes. However, little is known about the genetic composition of HCV populations in many countries of the region. Using conventional and next-generation sequencing (NGS), we identified and genetically characterized 65 HCV strains circulating among HCV-positive blood donors in Kumasi, Ghana. Phylogenetic analysis using consensus sequences derived from 3 genomic regions of the HCV genome, 5'-untranslated region, hypervariable region 1 (HVR1) and NS5B gene, consistently classified the HCV variants (n = 65) into genotypes 1 (HCV-1, 15%) and genotype 2 (HCV-2, 85%). The Ghanaian and West African HCV-2 NS5B sequences were found completely intermixed in the phylogenetic tree, indicating a substantial genetic heterogeneity of HCV-2 in Ghana. Analysis of HVR1 sequences from intra-host HCV variants obtained by NGS showed that three donors were infected with >1 HCV strain, including infections with 2 genotypes. Two other donors share an HCV strain, indicating HCV transmission between them. The HCV-2 strain sampled from one donor was replaced with another HCV-2 strain after only 2 months of observation, indicating rapid strain switching. Bayesian analysis estimated that the HCV-2 strains in Ghana were expanding since the 16th century. The blood donors in Kumasi, Ghana, are infected with a very heterogeneous HCV population of HCV-1 and HCV-2, with HCV-2 being prevalent. The detection of three cases of co- or super-infections and transmission linkage between 2 cases suggests frequent opportunities for HCV exposure among the blood donors and is consistent with the reported high HCV prevalence. The conditions for effective HCV-2 transmission existed for ~ 3–4 centuries, indicating a long epidemic history of HCV-2 in Ghana.
Collapse
Affiliation(s)
- Joseph C. Forbi
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
- * E-mail:
| | - Jennifer E. Layden
- Department of Public Health Sciences, Loyola University Chicago, Maywood, Illinois, United States of America
- Department of Medicine, Loyola University Chicago, Stritch School of Medicine, Maywood, IL, United States of America
| | - Richard O. Phillips
- Komfo Anokye Teaching Hospital, Kumasi, Ghana, West Africa
- Kwame Nkrumah University of Science and Technology, Kumasi, Ghana, West Africa
| | - Nallely Mora
- Department of Public Health Sciences, Loyola University Chicago, Maywood, Illinois, United States of America
| | - Guo-liang Xia
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - David S. Campo
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Michael A. Purdy
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Zoya E. Dimitrova
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | | | - Lili T. Punkova
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Pavel Skums
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | | | - Fred Stephen Sarfo
- Komfo Anokye Teaching Hospital, Kumasi, Ghana, West Africa
- Kwame Nkrumah University of Science and Technology, Kumasi, Ghana, West Africa
| | - Gilberto Vaughan
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | - Hajung Roh
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | | | - Richard S. Cooper
- Department of Public Health Sciences, Loyola University Chicago, Maywood, Illinois, United States of America
| | - Yury E. Khudyakov
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| |
Collapse
|
26
|
Zhou B, Dong H, He Y, Sun J, Jin W, Xie Q, Fan R, Wang M, Li R, Chen Y, Xie S, Shen Y, Huang X, Wang S, Lu F, Jia J, Zhuang H, Locarnini S, Zhao GP, Jin L, Hou J. Composition and Interactions of Hepatitis B Virus Quasispecies Defined the Virological Response During Telbivudine Therapy. Sci Rep 2015; 5:17123. [PMID: 26599443 PMCID: PMC4657086 DOI: 10.1038/srep17123] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 10/26/2015] [Indexed: 01/08/2023] Open
Abstract
Reverse transcriptase (RT) mutations contribute to hepatitis B virus resistance during antiviral therapy with nucleos(t)ide analogs. However, the composition of the RT quasispecies and their interactions during antiviral treatment have not yet been thoroughly defined. In this report, 10 patients from each of 3 different virological response groups, i.e., complete virological response, partial virological response and virological breakthrough, were selected from a multicenter trial of Telbivudine treatment. Variations in the drug resistance-related critical RT regions in 107 serial serum samples from the 30 patients were examined by ultra-deep sequencing. A total of 496,577 sequence reads were obtained, with an average sequencing coverage of 4,641X per sample. The phylogenies of the quasispecies revealed the independent origins of two critical quasispecies, i.e., the rtA181T and rtM204I mutants. Data analyses and theoretical modeling showed a cooperative-competitive interplay among the quasispecies. In particular, rtM204I mutants compete against other quasispecies, which eventually leads to virological breakthrough. However, in the absence of rtM204I mutants, synergistic growth of the drug-resistant rtA181T mutants with the wild-type quasispecies could drive the composition of the viral population into a state of partial virological response. Furthermore, we demonstrated that the frequency of drug-resistant mutations in the early phase of treatment is important for predicting the virological response to antiviral therapy.
Collapse
Affiliation(s)
- Bin Zhou
- State Key Laboratory of Organ Failure Research, Guangdong Provincial Key Laboratory of Viral Hepatitis Research, Department of Infectious Diseases, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Hui Dong
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, China
| | - Yungang He
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Jian Sun
- State Key Laboratory of Organ Failure Research, Guangdong Provincial Key Laboratory of Viral Hepatitis Research, Department of Infectious Diseases, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Weirong Jin
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, China.,Shanghai Shenyou Biotechnology Co., Ltd., Shanghai, China
| | - Qing Xie
- Department of Infectious Diseases, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Rong Fan
- State Key Laboratory of Organ Failure Research, Guangdong Provincial Key Laboratory of Viral Hepatitis Research, Department of Infectious Diseases, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Minxian Wang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ran Li
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yangyi Chen
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, China
| | - Shaoqing Xie
- Shanghai Shenyou Biotechnology Co., Ltd., Shanghai, China
| | - Yan Shen
- Shanghai Shenyou Biotechnology Co., Ltd., Shanghai, China
| | - Xin Huang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Shengyue Wang
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, China
| | - Fengming Lu
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
| | - Jidong Jia
- Liver Research Center, Beijing Friendship Hospital, Capital Medical University, Beijing, China
| | - Hui Zhuang
- Department of Microbiology and Infectious Disease Center, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
| | - Stephen Locarnini
- Victorian Infectious Diseases Reference Laboratory, North Melbourne, Victoria, Australia
| | - Guo-Ping Zhao
- Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai, China.,CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Department of Microbiology and Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China.,State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences; Key Laboratory of Medical Molecular Virology affiliated to the Ministries of Education and Health, Shanghai Medical College and Department of Microbiology, School of Life Sciences; Fudan University, Shanghai, China
| | - Li Jin
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology; CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology; Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences; Key Laboratory of Medical Molecular Virology affiliated to the Ministries of Education and Health, Shanghai Medical College and Department of Microbiology, School of Life Sciences; Fudan University, Shanghai, China
| | - Jinlin Hou
- State Key Laboratory of Organ Failure Research, Guangdong Provincial Key Laboratory of Viral Hepatitis Research, Department of Infectious Diseases, Nanfang Hospital, Southern Medical University, Guangzhou, China.,Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang University, Hangzhou, China
| |
Collapse
|
27
|
Jones LR, Sede M, Manrique JM, Quarleri J. Virus evolution during chronic hepatitis B virus infection as revealed by ultradeep sequencing data. J Gen Virol 2015; 97:435-444. [PMID: 26581478 DOI: 10.1099/jgv.0.000344] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Despite chronic hepatitis B virus (HBV) infection (CHB) being a leading cause of liver cirrhosis and cancer, HBV evolution during CHB is not fully understood. Recent studies have indicated that virus diversity progressively increases along the course of CHB and that some virus mutations correlate with severe liver conditions such as chronic hepatitis, cirrhosis and hepatocellular carcinoma. Using ultradeep sequencing (UDS) data from an intrafamilial case, we detected such mutations at low frequencies among three immunotolerant patients and at high frequencies in an inactive carrier. Furthermore, our analyses indicated that the HBV population from the seroconverter patient underwent many genetic changes in response to virus clearance. Together, these data indicate a potential use of UDS for developing non-invasive biomarkers for monitoring disease changes over time or in response to specific therapies. In addition, our analyses revealed that virus clearance seemed not to require the virus effective population size to decline. A detailed genetic analysis of the viral lineages arising during and after the clearance suggested that mutations at or close to critical elements of the core promoter (enhancer II, epsilon encapsidation signal, TA2, TA3 and direct repeat 1-hormone response element) might be responsible for a sustained replication. This hypothesis requires the decline in virus load to be explained by constant clearance of virus-producing hepatocytes, consistent with the sustained progress towards serious liver conditions experienced by many CHB patients.
Collapse
Affiliation(s)
- Leandro R Jones
- Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, 9 de Julio y Begrano S/N (9100) Trelew, Chubut, Argentina.,Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917 (C1083ACA) Buenos Aires, Argentina
| | - Mariano Sede
- Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917 (C1083ACA) Buenos Aires, Argentina.,Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires, Paraguay 2155-Piso 11 (C1121ABG) Buenos Aires, Argentina
| | - Julieta M Manrique
- Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917 (C1083ACA) Buenos Aires, Argentina.,Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, 9 de Julio y Begrano S/N (9100) Trelew, Chubut, Argentina
| | - Jorge Quarleri
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Facultad de Medicina, Universidad de Buenos Aires, Paraguay 2155-Piso 11 (C1121ABG) Buenos Aires, Argentina.,Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917 (C1083ACA) Buenos Aires, Argentina
| |
Collapse
|
28
|
Campo DS, Xia GL, Dimitrova Z, Lin Y, Forbi JC, Ganova-Raeva L, Punkova L, Ramachandran S, Thai H, Skums P, Sims S, Rytsareva I, Vaughan G, Roh HJ, Purdy MA, Sue A, Khudyakov Y. Accurate Genetic Detection of Hepatitis C Virus Transmissions in Outbreak Settings. J Infect Dis 2015; 213:957-65. [PMID: 26582955 DOI: 10.1093/infdis/jiv542] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2015] [Accepted: 10/08/2015] [Indexed: 12/18/2022] Open
Abstract
Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections are associated with unsafe injection practices, drug diversion, and other exposures to blood and are difficult to detect and investigate. Here, we developed and validated a simple approach for molecular detection of HCV transmissions in outbreak settings. We obtained sequences from the HCV hypervariable region 1 (HVR1), using end-point limiting-dilution (EPLD) technique, from 127 cases involved in 32 epidemiologically defined HCV outbreaks and 193 individuals with unrelated HCV strains. We compared several types of genetic distances and calculated a threshold, using minimal Hamming distances, that identifies transmission clusters in all tested outbreaks with 100% accuracy. The approach was also validated on sequences obtained using next-generation sequencing from HCV strains recovered from 239 individuals, and findings showed the same accuracy as that for EPLD. On average, the nucleotide diversity of the intrahost population was 6.2 times greater in the source case than in any incident case, allowing the correct detection of transmission direction in 8 outbreaks for which source cases were known. A simple and accurate distance-based approach developed here for detecting HCV transmissions streamlines molecular investigation of outbreaks, thus improving the public health capacity for rapid and effective control of hepatitis C.
Collapse
Affiliation(s)
- David S Campo
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Guo-Liang Xia
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Zoya Dimitrova
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Yulin Lin
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Joseph C Forbi
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Lilia Ganova-Raeva
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Lili Punkova
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Sumathi Ramachandran
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Hong Thai
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Pavel Skums
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Seth Sims
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Inna Rytsareva
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Gilberto Vaughan
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Ha-Jung Roh
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Michael A Purdy
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Amanda Sue
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Yury Khudyakov
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, Georgia
| |
Collapse
|
29
|
Thangam M, Gopal RK. CRCDA--Comprehensive resources for cancer NGS data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav092. [PMID: 26450948 PMCID: PMC4597977 DOI: 10.1093/database/bav092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/31/2015] [Indexed: 12/24/2022]
Abstract
Next generation sequencing (NGS) innovations put a compelling landmark in life science and changed the direction of research in clinical oncology with its productivity to diagnose and treat cancer. The aim of our portal comprehensive resources for cancer NGS data analysis (CRCDA) is to provide a collection of different NGS tools and pipelines under diverse classes with cancer pathways and databases and furthermore, literature information from PubMed. The literature data was constrained to 18 most common cancer types such as breast cancer, colon cancer and other cancers that exhibit in worldwide population. NGS-cancer tools for the convenience have been categorized into cancer genomics, cancer transcriptomics, cancer epigenomics, quality control and visualization. Pipelines for variant detection, quality control and data analysis were listed to provide out-of-the box solution for NGS data analysis, which may help researchers to overcome challenges in selecting and configuring individual tools for analysing exome, whole genome and transcriptome data. An extensive search page was developed that can be queried by using (i) type of data [literature, gene data and sequence read archive (SRA) data] and (ii) type of cancer (selected based on global incidence and accessibility of data). For each category of analysis, variety of tools are available and the biggest challenge is in searching and using the right tool for the right application. The objective of the work is collecting tools in each category available at various places and arranging the tools and other data in a simple and user-friendly manner for biologists and oncologists to find information easier. To the best of our knowledge, we have collected and presented a comprehensive package of most of the resources available in cancer for NGS data analysis. Given these factors, we believe that this website will be an useful resource to the NGS research community working on cancer. Database URL: http://bioinfo.au-kbc.org.in/ngs/ngshome.html.
Collapse
Affiliation(s)
- Manonanthini Thangam
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| | - Ramesh Kumar Gopal
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| |
Collapse
|
30
|
Cacho A, Smirnova E, Huzurbazar S, Cui X. A Comparison of Base-calling Algorithms for Illumina Sequencing Technology. Brief Bioinform 2015; 17:786-95. [PMID: 26443614 DOI: 10.1093/bib/bbv088] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Indexed: 11/14/2022] Open
Abstract
Recent advances in next-generation sequencing technology have yielded increasing cost-effectiveness and higher throughput produced per run, in turn, greatly influencing the analysis of DNA sequences. Among the various sequencing technologies, Illumina is by far the most widely used platform. However, the Illumina sequencing platform suffers from several imperfections that can be attributed to the chemical processes inherent to the sequencing-by-synthesis technology. With the enormous amounts of reads produced, statistical methodologies and computationally efficient algorithms are required to improve the accuracy and speed of base-calling. Over the past few years, several papers have proposed methods to model the various imperfections, giving rise to accurate and/or efficient base-calling algorithms. In this article, we provide a comprehensive comparison of the performance of recently developed base-callers and we present a general statistical model that unifies a large majority of these base-callers.
Collapse
|
31
|
Liu Y, Chiaromonte F, Ross H, Malhotra R, Elleder D, Poss M. Error correction and statistical analyses for intra-host comparisons of feline immunodeficiency virus diversity from high-throughput sequencing data. BMC Bioinformatics 2015; 16:202. [PMID: 26123018 PMCID: PMC4486422 DOI: 10.1186/s12859-015-0607-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 04/29/2015] [Indexed: 11/16/2022] Open
Abstract
Background Infection with feline immunodeficiency virus (FIV) causes an immunosuppressive disease whose consequences are less severe if cats are co-infected with an attenuated FIV strain (PLV). We use virus diversity measurements, which reflect replication ability and the virus response to various conditions, to test whether diversity of virulent FIV in lymphoid tissues is altered in the presence of PLV. Our data consisted of the 3′ half of the FIV genome from three tissues of animals infected with FIV alone, or with FIV and PLV, sequenced by 454 technology. Results Since rare variants dominate virus populations, we had to carefully distinguish sequence variation from errors due to experimental protocols and sequencing. We considered an exponential-normal convolution model used for background correction of microarray data, and modified it to formulate an error correction approach for minor allele frequencies derived from high-throughput sequencing. Similar to accounting for over-dispersion in counts, this accounts for error-inflated variability in frequencies – and quite effectively reproduces empirically observed distributions. After obtaining error-corrected minor allele frequencies, we applied ANalysis Of VAriance (ANOVA) based on a linear mixed model and found that conserved sites and transition frequencies in FIV genes differ among tissues of dual and single infected cats. Furthermore, analysis of minor allele frequencies at individual FIV genome sites revealed 242 sites significantly affected by infection status (dual vs. single) or infection status by tissue interaction. All together, our results demonstrated a decrease in FIV diversity in bone marrow in the presence of PLV. Importantly, these effects were weakened or undetectable when error correction was performed with other approaches (thresholding of minor allele frequencies; probabilistic clustering of reads). We also queried the data for cytidine deaminase activity on the viral genome, which causes an asymmetric increase in G to A substitutions, but found no evidence for this host defense strategy. Conclusions Our error correction approach for minor allele frequencies (more sensitive and computationally efficient than other algorithms) and our statistical treatment of variation (ANOVA) were critical for effective use of high-throughput sequencing data in understanding viral diversity. We found that co-infection with PLV shifts FIV diversity from bone marrow to lymph node and spleen. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0607-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yang Liu
- Department of Statistics, The Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA.
| | - Francesca Chiaromonte
- Department of Statistics, The Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA.
| | - Howard Ross
- Bioinformatics Institute, School of Biological Sciences, University of Auckland, Auckland, 1142, New Zealand.
| | - Raunaq Malhotra
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, 16802, USA.
| | - Daniel Elleder
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA. .,Current address: Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Videnska 1083, Prague, 14000, Czech Republic.
| | - Mary Poss
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA. .,Department of Veterinary and Biomedical Sciences, The Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
32
|
Lin K, Smit S, Bonnema G, Sanchez-Perez G, de Ridder D. Making the difference: integrating structural variation detection tools. Brief Bioinform 2014; 16:852-64. [PMID: 25504367 DOI: 10.1093/bib/bbu047] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2014] [Indexed: 01/01/2023] Open
Abstract
From prokaryotes to eukaryotes, phenotypic variation, adaptation and speciation has been associated with structural variation between genomes of individuals within the same species. Many computer algorithms detecting such variations (callers) have recently been developed, spurred by the advent of the next-generation sequencing technology. Such callers mainly exploit split-read mapping or paired-end read mapping. However, as different callers are geared towards different types of structural variation, there is still no single caller that can be considered a community standard; instead, increasingly the various callers are combined in integrated pipelines. In this article, we review a wide range of callers, discuss challenges in the integration step and present a survey of pipelines used in population genomics studies. Based on our findings, we provide general recommendations on how to set-up such pipelines. Finally, we present an outlook on future challenges in structural variation detection.
Collapse
|
33
|
Wood GR, Burroughs NJ, Evans DJ, Ryabov EV. Error correction and diversity analysis of population mixtures determined by NGS. PeerJ 2014; 2:e645. [PMID: 25405074 PMCID: PMC4232844 DOI: 10.7717/peerj.645] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Accepted: 10/10/2014] [Indexed: 11/20/2022] Open
Abstract
The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees. The paper has two findings. First, a method for correction of next generation sequencing error in the distribution of nucleotides at a site is developed. Second, a package of methods for assessment of nucleotide diversity is assembled. The error correction method is statistically based and works at the level of the nucleotide distribution rather than the level of individual nucleotides. The method relies on an error model and a sample of known viral genotypes that is used for model calibration. A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated. The methods are illustrated using honeybee viral samples. Software in both Excel and Matlab and a guide are available at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/, the Warwick University Systems Biology Centre software download site.
Collapse
Affiliation(s)
- Graham R Wood
- Warwick Systems Biology Centre, University of Warwick , Coventry , United Kingdom
| | - Nigel J Burroughs
- Warwick Systems Biology Centre, University of Warwick , Coventry , United Kingdom
| | - David J Evans
- School of Life Sciences, University of Warwick , Coventry , United Kingdom
| | - Eugene V Ryabov
- School of Life Sciences, University of Warwick , Coventry , United Kingdom
| |
Collapse
|
34
|
Skums P, Artyomenko A, Glebova O, Ramachandran S, Mandoiu I, Campo DS, Dimitrova Z, Zelikovsky A, Khudyakov Y. Computational framework for next-generation sequencing of heterogeneous viral populations using combinatorial pooling. ACTA ACUST UNITED AC 2014; 31:682-90. [PMID: 25359889 DOI: 10.1093/bioinformatics/btu726] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Next-generation sequencing (NGS) allows for analyzing a large number of viral sequences from infected patients, providing an opportunity to implement large-scale molecular surveillance of viral diseases. However, despite improvements in technology, traditional protocols for NGS of large numbers of samples are still highly cost and labor intensive. One of the possible cost-effective alternatives is combinatorial pooling. Although a number of pooling strategies for consensus sequencing of DNA samples and detection of SNPs have been proposed, these strategies cannot be applied to sequencing of highly heterogeneous viral populations. RESULTS We developed a cost-effective and reliable protocol for sequencing of viral samples, that combines NGS using barcoding and combinatorial pooling and a computational framework including algorithms for optimal virus-specific pools design and deconvolution of individual samples from sequenced pools. Evaluation of the framework on experimental and simulated data for hepatitis C virus showed that it substantially reduces the sequencing costs and allows deconvolution of viral populations with a high accuracy. AVAILABILITY AND IMPLEMENTATION The source code and experimental data sets are available at http://alan.cs.gsu.edu/NGS/?q=content/pooling.
Collapse
Affiliation(s)
- Pavel Skums
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Alexander Artyomenko
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Olga Glebova
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Sumathi Ramachandran
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Ion Mandoiu
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - David S Campo
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Zoya Dimitrova
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Alex Zelikovsky
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Yury Khudyakov
- Division of Viral Hepatitis, Centers of Disease Control and Prevention, Atlanta, GA, USA, Department of Computer Science, Georgia State University, Atlanta, GA, USA and Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
35
|
Analysis of the evolution and structure of a complex intrahost viral population in chronic hepatitis C virus mapped by ultradeep pyrosequencing. J Virol 2014; 88:13709-21. [PMID: 25231312 DOI: 10.1128/jvi.01732-14] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
UNLABELLED Hepatitis C virus (HCV) causes chronic infection in up to 50% to 80% of infected individuals. Hypervariable region 1 (HVR1) variability is frequently studied to gain an insight into the mechanisms of HCV adaptation during chronic infection, but the changes to and persistence of HCV subpopulations during intrahost evolution are poorly understood. In this study, we used ultradeep pyrosequencing (UDPS) to map the viral heterogeneity of a single patient over 9.6 years of chronic HCV genotype 4a infection. Informed error correction of the raw UDPS data was performed using a temporally matched clonal data set. The resultant data set reported the detection of low-frequency recombinants throughout the study period, implying that recombination is an active mechanism through which HCV can explore novel sequence space. The data indicate that polyvirus infection of hepatocytes has occurred but that the fitness quotients of recombinant daughter virions are too low for the daughter virions to compete against the parental genomes. The subpopulations of parental genomes contributing to the recombination events highlighted a dynamic virome where subpopulations of variants are in competition. In addition, we provide direct evidence that demonstrates the growth of subdominant populations to dominance in the absence of a detectable humoral response. IMPORTANCE Analysis of ultradeep pyrosequencing data sets derived from virus amplicons frequently relies on software tools that are not optimized for amplicon analysis, assume random incorporation of sequencing errors, and are focused on achieving higher specificity at the expense of sensitivity. Such analysis is further complicated by the presence of hypervariable regions. In this study, we made use of a temporally matched reference sequence data set to inform error correction algorithms. Using this methodology, we were able to (i) detect multiple instances of hepatitis C virus intrasubtype recombination at the E1/E2 junction (a phenomenon rarely reported in the literature) and (ii) interrogate the longitudinal quasispecies complexity of the virome. Parallel to the UDPS, isolation of IgG-bound virions was found to coincide with the collapse of specific viral subpopulations.
Collapse
|
36
|
Chabria SB, Gupta S, Kozal MJ. Deep Sequencing of HIV: Clinical and Research Applications. Annu Rev Genomics Hum Genet 2014; 15:295-325. [DOI: 10.1146/annurev-genom-091212-153406] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Shiven B. Chabria
- Section of Infectious Diseases, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut 06510; , ,
| | - Shaili Gupta
- Section of Infectious Diseases, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut 06510; , ,
- Section of Infectious Diseases, Department of Internal Medicine, VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Michael J. Kozal
- Section of Infectious Diseases, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut 06510; , ,
- Section of Infectious Diseases, Department of Internal Medicine, VA Connecticut Healthcare System, West Haven, Connecticut 06516
| |
Collapse
|
37
|
Sede MM, Moretti FA, Laufer NL, Jones LR, Quarleri JF. HIV-1 tropism dynamics and phylogenetic analysis from longitudinal ultra-deep sequencing data of CCR5- and CXCR4-using variants. PLoS One 2014; 9:e102857. [PMID: 25032817 PMCID: PMC4102574 DOI: 10.1371/journal.pone.0102857] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Accepted: 06/25/2014] [Indexed: 11/25/2022] Open
Abstract
OBJECTIVE Coreceptor switch from CCR5 to CXCR4 is associated with HIV disease progression. The molecular and evolutionary mechanisms underlying the CCR5 to CXCR4 switch are the focus of intense recent research. We studied the HIV-1 tropism dynamics in relation to coreceptor usage, the nature of quasispecies from ultra deep sequencing (UDPS) data and their phylogenetic relationships. METHODS Here, we characterized C2-V3-C3 sequences of HIV obtained from 19 patients followed up for 54 to 114 months using UDPS, with further genotyping and phylogenetic analysis for coreceptor usage. HIV quasispecies diversity and variability as well as HIV plasma viral load were measured longitudinally and their relationship with the HIV coreceptor usage was analyzed. The longitudinal UDPS data were submitted to phylogenetic analysis and sampling times and coreceptor usage were mapped onto the trees obtained. RESULTS Although a temporal viral genetic structuring was evident, the persistence of several viral lineages evolving independently along the infection was statistically supported, indicating a complex scenario for the evolution of viral quasispecies. HIV X4-using variants were present in most of our patients, exhibiting a dissimilar inter- and intra-patient predominance as the component of quasispecies even on antiretroviral therapy. The viral populations from some of the patients studied displayed evidences of the evolution of X4 variants through fitness valleys, whereas for other patients the data favored a gradual mode of emergence. CONCLUSIONS CXCR4 usage can emerge independently, in multiple lineages, along the course of HIV infection. The mode of emergence, i.e. gradual or through fitness valleys seems to depend on both virus and patient factors. Furthermore, our analyses suggest that, besides becoming dominant after population-level switches, minor proportions of X4 viruses might exist along the infection, perhaps even at early stages of it. The fate of these minor variants might depend on both viral and host factors.
Collapse
Affiliation(s)
- Mariano M. Sede
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Franco A. Moretti
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Natalia L. Laufer
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Leandro R. Jones
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
- Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales, sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, Chubut, Argentina
| | - Jorge F. Quarleri
- Instituto de Investigaciones Biomédicas en Retrovirus y Sida (INBIRS), Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina
- Consejo de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
38
|
Campo DS, Dimitrova Z, Yamasaki L, Skums P, Lau DT, Vaughan G, Forbi JC, Teo CG, Khudyakov Y. Next-generation sequencing reveals large connected networks of intra-host HCV variants. BMC Genomics 2014; 15 Suppl 5:S4. [PMID: 25081811 PMCID: PMC4120142 DOI: 10.1186/1471-2164-15-s5-s4] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background Next-generation sequencing (NGS) allows for sampling numerous viral variants from infected patients. This provides a novel opportunity to represent and study the mutational landscape of Hepatitis C Virus (HCV) within a single host. Results Intra-host variants of the HCV E1/E2 region were extensively sampled from 58 chronically infected patients. After NGS error correction, the average number of reads and variants obtained from each sample were 3202 and 464, respectively. The distance between each pair of variants was calculated and networks were created for each patient, where each node is a variant and two nodes are connected by a link if the nucleotide distance between them is 1. The work focused on large components having > 5% of all reads, which in average account for 93.7% of all reads found in a patient. The distance between any two variants calculated over the component correlated strongly with nucleotide distances (r = 0.9499; p = 0.0001), a better correlation than the one obtained with Neighbour-Joining trees (r = 0.7624; p = 0.0001). In each patient, components were well separated, with the average distance between (6.53%) being 10 times greater than within each component (0.68%). The ratio of nonsynonymous to synonymous changes was calculated and some patients (6.9%) showed a mixture of networks under strong negative and positive selection. All components were robust to in silico stochastic sampling; even after randomly removing 85% of all reads, the largest connected component in the new subsample still involved 82.4% of remaining nodes. In vitro sampling showed that 93.02% of components present in the original sample were also found in experimental replicas, with 81.6% of reads found in both. When syringe-sharing transmission events were simulated, 91.2% of all simulated transmission events seeded all components present in the source. Conclusions Most intra-host variants are organized into distinct single-mutation components that are: well separated from each other, represent genetic distances between viral variants, robust to sampling, reproducible and likely seeded during transmission events. Facilitated by NGS, large components offer a novel evolutionary framework for genetic analysis of intra-host viral populations and understanding transmission, immune escape and drug resistance.
Collapse
|
39
|
Giannuzzi G, Migliavacca E, Reymond A. Novel H3K4me3 marks are enriched at human- and chimpanzee-specific cytogenetic structures. Genome Res 2014; 24:1455-68. [PMID: 24916972 PMCID: PMC4158755 DOI: 10.1101/gr.167742.113] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Human and chimpanzee genomes are 98.8% identical within comparable sequences. However, they differ structurally in nine pericentric inversions, one fusion that originated human chromosome 2, and content and localization of heterochromatin and lineage-specific segmental duplications. The possible functional consequences of these cytogenetic and structural differences are not fully understood and their possible involvement in speciation remains unclear. We show that subtelomeric regions—regions that have a species-specific organization, are more divergent in sequence, and are enriched in genes and recombination hotspots—are significantly enriched for species-specific histone modifications that decorate transcription start sites in different tissues in both human and chimpanzee. The human lineage-specific chromosome 2 fusion point and ancestral centromere locus as well as chromosome 1 and 18 pericentric inversion breakpoints showed enrichment of human-specific H3K4me3 peaks in the prefrontal cortex. Our results reveal an association between plastic regions and potential novel regulatory elements.
Collapse
Affiliation(s)
- Giuliana Giannuzzi
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland;
| | - Eugenia Migliavacca
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland; Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland;
| |
Collapse
|
40
|
Xiaobai Z, Xi C, Tian H, Williams AB, Wang H, He J, Zhen J, Chiarella J, Blake LA, Turenchalk G, Kozal MJ. Prevalence of WHO transmitted drug resistance mutations by deep sequencing in antiretroviral-naïve subjects in Hunan Province, China. PLoS One 2014; 9:e98740. [PMID: 24896087 PMCID: PMC4045886 DOI: 10.1371/journal.pone.0098740] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 05/07/2014] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND There are few data on the prevalence of WHO transmitted drug resistance mutations (TDRs) that could affect treatment responses to first line antiretroviral therapy (ART) in Hunan Province, China. OBJECTIVE Determine the prevalence of WHO NRTI/NNRTI/PI TDRs in ART-naïve subjects in Hunan Province by deep sequencing. METHODS ART-naïve subjects diagnosed in Hunan between 2010-2011 were evaluated by deep sequencing for low-frequency HIV variants possessing WHO TDRs to 1% levels. Mutations were scored using the HIVdb.stanford.edu algorithm to infer drug susceptibility. RESULTS Deep sequencing was performed on samples from 90 ART-naïve subjects; 83.3% were AE subtype. All subjects had advanced disease (average CD4 count 134 cells/mm3). Overall 25.6%(23/90) of subjects had HIV with major WHO NRTI/NNRTI TDRs by deep sequencing at a variant frequency level ≥ 1%; 16.7%(15/90) had NRTI TDR and 12.2%(11/90) had a major NNRTI TDR. The majority of NRTI/NNRTI mutations were identified at variant levels <5%. Mutations were analyzed by HIVdb.stanford.edu and 7.8% of subjects had variants with high-level nevirapine resistance; 4.4% had high-level NRTI resistance. Deep sequencing identified 24(27.6%) subjects with variants possessing either a PI TDR or hivdb.stanford.edu PI mutation (algorithm value ≥ 15). 17(19.5%) had PI TDRs at levels >1%. CONCLUSIONS ART-naïve subjects from Hunan Province China infected predominantly with subtype AE frequently possessed HIV variants with WHO NRTI/NNRTI TDRs by deep sequencing that would affect the first line ART used in the region. Specific mutations conferring nevirapine high-level resistance were identified in 7.8% of subjects. The majority of TDRs detected were at variant levels <5% likely due to subjects having advanced chronic disease at the time of testing. PI TDRs were identified frequently, but were found in isolation and at low variant frequency. As PI/r use is infrequent in Hunan, the existence of PI mutations likely represent AE subtype natural polymorphism at low variant level frequency.
Collapse
Affiliation(s)
- Zou Xiaobai
- Hunan Provincial Center for Disease Control and Prevention, Changsha, Hunan Province, China
| | - Chen Xi
- Hunan Provincial Center for Disease Control and Prevention, Changsha, Hunan Province, China
- * E-mail:
| | - Hongping Tian
- Yale-China Association, New Haven, Connecticut, United States of America
| | - Ann B. Williams
- UCLA School of Nursing, Los Angeles, California, United States of America
| | | | - Jianmei He
- Hunan Provincial Center for Disease Control and Prevention, Changsha, Hunan Province, China
| | - Jun Zhen
- Hunan Provincial Center for Disease Control and Prevention, Changsha, Hunan Province, China
| | - Jennifer Chiarella
- Yale School of Medicine, New Haven, Connecticut, United States of America
| | | | | | - Michael J. Kozal
- Yale School of Medicine, New Haven, Connecticut, United States of America
| |
Collapse
|
41
|
Knief C. Analysis of plant microbe interactions in the era of next generation sequencing technologies. FRONTIERS IN PLANT SCIENCE 2014; 5:216. [PMID: 24904612 PMCID: PMC4033234 DOI: 10.3389/fpls.2014.00216] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Accepted: 04/30/2014] [Indexed: 05/18/2023]
Abstract
Next generation sequencing (NGS) technologies have impressively accelerated research in biological science during the last years by enabling the production of large volumes of sequence data to a drastically lower price per base, compared to traditional sequencing methods. The recent and ongoing developments in the field allow addressing research questions in plant-microbe biology that were not conceivable just a few years ago. The present review provides an overview of NGS technologies and their usefulness for the analysis of microorganisms that live in association with plants. Possible limitations of the different sequencing systems, in particular sources of errors and bias, are critically discussed and methods are disclosed that help to overcome these shortcomings. A focus will be on the application of NGS methods in metagenomic studies, including the analysis of microbial communities by amplicon sequencing, which can be considered as a targeted metagenomic approach. Different applications of NGS technologies are exemplified by selected research articles that address the biology of the plant associated microbiota to demonstrate the worth of the new methods.
Collapse
Affiliation(s)
- Claudia Knief
- Institute of Crop Science and Resource Conservation—Molecular Biology of the Rhizosphere, Faculty of Agriculture, University of BonnBonn, Germany
| |
Collapse
|
42
|
Töpfer A, Marschall T, Bull RA, Luciani F, Schönhuth A, Beerenwinkel N. Viral quasispecies assembly via maximal clique enumeration. PLoS Comput Biol 2014; 10:e1003515. [PMID: 24675810 PMCID: PMC3967922 DOI: 10.1371/journal.pcbi.1003515] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 01/31/2014] [Indexed: 11/25/2022] Open
Abstract
Virus populations can display high genetic diversity within individual hosts. The intra-host collection of viral haplotypes, called viral quasispecies, is an important determinant of virulence, pathogenesis, and treatment outcome. We present HaploClique, a computational approach to reconstruct the structure of a viral quasispecies from next-generation sequencing data as obtained from bulk sequencing of mixed virus samples. We develop a statistical model for paired-end reads accounting for mutations, insertions, and deletions. Using an iterative maximal clique enumeration approach, read pairs are assembled into haplotypes of increasing length, eventually enabling global haplotype assembly. The performance of our quasispecies assembly method is assessed on simulated data for varying population characteristics and sequencing technology parameters. Owing to its paired-end handling, HaploClique compares favorably to state-of-the-art haplotype inference methods. It can reconstruct error-free full-length haplotypes from low coverage samples and detect large insertions and deletions at low frequencies. We applied HaploClique to sequencing data derived from a clinical hepatitis C virus population of an infected patient and discovered a novel deletion of length 357±167 bp that was validated by two independent long-read sequencing experiments. HaploClique is available at https://github.com/armintoepfer/haploclique. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2-5.
Collapse
Affiliation(s)
- Armin Töpfer
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | | - Rowena A. Bull
- Inflammation and Infection Research Centre, School of Medical Sciences, UNSW, Sydney, Australia
| | - Fabio Luciani
- Inflammation and Infection Research Centre, School of Medical Sciences, UNSW, Sydney, Australia
| | | | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
43
|
Forbi JC, Campo DS, Purdy MA, Dimitrova ZE, Skums P, Xia GL, Punkova LT, Ganova-Raeva LM, Vaughan G, Ben-Ayed Y, Switzer WM, Khudyakov YE. Intra-host diversity and evolution of hepatitis C virus endemic to Côte d'Ivoire. J Med Virol 2014; 86:765-71. [PMID: 24519518 DOI: 10.1002/jmv.23897] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/14/2014] [Indexed: 12/12/2022]
Abstract
Hepatitis C virus (HCV) infection presents an important, but underappreciated public health problem in Africa. In Côte d'Ivoire, very little is known about the molecular dynamics of HCV infection. Plasma samples (n = 608) from pregnant women collected in 1995 from Côte d'Ivoire were analyzed in this study. Only 18 specimens (∼3%) were found to be HCV PCR-positive. Phylogenetic analysis of the HCV NS5b sequences showed that the HCV variants belong to genotype 1 (HCV1) (n = 12, 67%) and genotype 2 (HCV2) (n = 6, 33%), with a maximum genetic diversity among HCV variants in each genotype being 20.7% and 24.0%, respectively. Although all HCV2 variants were genetically distant from each other, six HCV1 variants formed two tight sub-clusters belonging to HCV1a and HCV1b. Analysis of molecular variance (AMOVA) showed that the genetic structure of HCV isolates from West Africa with Côte d'Ivoire included were significantly different from Central African strains (P = 0.0001). Examination of intra-host viral populations using next-generation sequencing of the HCV HVR1 showed a significant variation in intra-host genetic diversity among infected individuals, with some strains composed of sub-populations as distant from each other as viral populations from different hosts. Collectively, the results indicate a complex HCV evolution in Côte d'Ivoire, similar to the rest of West Africa, and suggest a unique HCV epidemic history in the country.
Collapse
Affiliation(s)
- Joseph C Forbi
- Division of Viral Hepatitis, National Center for HIV, Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GEORGIA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Drug resistance of a viral population and its individual intrahost variants during the first 48 hours of therapy. Clin Pharmacol Ther 2014; 95:627-35. [PMID: 24488144 PMCID: PMC4215939 DOI: 10.1038/clpt.2014.20] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 01/22/2014] [Indexed: 12/20/2022]
Abstract
Using HCV and IFN-resistance as a proof of concept, we have devised a new methodology for calculating the effect of a drug over a viral population and the resistance of its individual intra-host variants. By means of next-generation sequencing, HCV variants were obtained from sera collected at 9 time-points from 16 patients during the first 48 hours after injection of IFN-α. IFN-resistance coefficients were calculated for individual variants using changes in their relative frequencies, and for the entire intra-host viral population using changes in viral titer during the initial 48 hours. Population-wide resistance and presence of IFN-resistant variants were highly associated with pegIFN-α2a/RBV treatment outcome at week 12 (p = 3.78×10-5 and 0.0114, respectively). This new method allows an accurate measurement of resistance based solely on changes in viral titer or the relative frequency of intra-host viral variants during a short observation time.
Collapse
|
45
|
Glouzon JPS, Bolduc F, Wang S, Najmanovich RJ, Perreault JP. Deep-sequencing of the peach latent mosaic viroid reveals new aspects of population heterogeneity. PLoS One 2014; 9:e87297. [PMID: 24498066 PMCID: PMC3907566 DOI: 10.1371/journal.pone.0087297] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 12/24/2013] [Indexed: 01/04/2023] Open
Abstract
Viroids are small circular single-stranded infectious RNAs characterized by a relatively high mutation level. Knowledge of their sequence heterogeneity remains largely elusive and previous studies, using Sanger sequencing, were based on a limited number of sequences. In an attempt to address sequence heterogeneity from a population dynamics perspective, a GF305-indicator peach tree was infected with a single variant of the Avsunviroidae family member Peach latent mosaic viroid (PLMVd). Six months post-inoculation, full-length circular conformers of PLMVd were isolated and deep-sequenced. We devised an original approach to the bioinformatics refinement of our sequence libraries involving important phenotypic data, based on the systematic analysis of hammerhead self-cleavage activity. Two distinct libraries yielded a total of 3,939 different PLMVd variants. Sequence variants exhibiting up to ∼17% of mutations relative to the inoculated viroid were retrieved, clearly illustrating the high level of divergence dynamics within a unique population. While we initially assumed that most positions of the viroid sequence would mutate, we were surprised to discover that ∼50% of positions remained perfectly conserved, including several small stretches as well as a small motif reminiscent of a GNRA tetraloop which are the result of various selective pressures. Using a hierarchical clustering algorithm, the different variants harvested were subdivided into 7 clusters. We found that most sequences contained an average of 4.6 to 6.4 mutations compared to the variant used to initially inoculate the plant. Interestingly, it was possible to reconstitute and compare the sequence evolution of each of these clusters. In doing so, we identified several key mutations. This study provides a reliable pipeline for the treatment of viroid deep-sequencing. It also sheds new light on the extent of sequence variation that a viroid population can sustain, and which may give rise to a quasispecies.
Collapse
Affiliation(s)
- Jean-Pierre Sehi Glouzon
- Département d’informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, Québec, Canada
- Département de biochimie, Faculté de médecine et des sciences de la santé, Pavillon de Recherche Appliquée au Cancer, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - François Bolduc
- Département de biochimie, Faculté de médecine et des sciences de la santé, Pavillon de Recherche Appliquée au Cancer, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Shengrui Wang
- Département d’informatique, Faculté des Sciences, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Rafael J. Najmanovich
- Département de biochimie, Faculté de médecine et des sciences de la santé, Pavillon de Recherche Appliquée au Cancer, Université de Sherbrooke, Sherbrooke, Québec, Canada
- * E-mail: (RJN); (JPP)
| | - Jean-Pierre Perreault
- Département de biochimie, Faculté de médecine et des sciences de la santé, Pavillon de Recherche Appliquée au Cancer, Université de Sherbrooke, Sherbrooke, Québec, Canada
- * E-mail: (RJN); (JPP)
| |
Collapse
|
46
|
McElroy K, Thomas T, Luciani F. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions. MICROBIAL INFORMATICS AND EXPERIMENTATION 2014; 4:1. [PMID: 24428920 PMCID: PMC3902414 DOI: 10.1186/2042-5783-4-1] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 01/07/2014] [Indexed: 12/15/2022]
Abstract
Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads.
Collapse
Affiliation(s)
- Kerensa McElroy
- Centre for Marine Bio-Innovation and School of Biotechnology and Biomolecular Sciences, UNSW, Sydney, NSW 2052, Australia.
| | | | | |
Collapse
|
47
|
Prosperi MCF, Yin L, Nolan DJ, Lowe AD, Goodenow MM, Salemi M. Empirical validation of viral quasispecies assembly algorithms: state-of-the-art and challenges. Sci Rep 2013; 3:2837. [PMID: 24089188 PMCID: PMC3789152 DOI: 10.1038/srep02837] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Accepted: 09/13/2013] [Indexed: 11/22/2022] Open
Abstract
Next generation sequencing (NGS) is superseding Sanger technology for analysing intra-host viral populations, in terms of genome length and resolution. We introduce two new empirical validation data sets and test the available viral population assembly software. Two intra-host viral population 'quasispecies' samples (type-1 human immunodeficiency and hepatitis C virus) were Sanger-sequenced, and plasmid clone mixtures at controlled proportions were shotgun-sequenced using Roche's 454 sequencing platform. The performance of different assemblers was compared in terms of phylogenetic clustering and recombination with the Sanger clones. Phylogenetic clustering showed that all assemblers captured a proportion of the most divergent lineages, but none were able to provide a high precision/recall tradeoff. Estimated variant frequencies mildly correlated with the original. Given the limitations of currently available algorithms identified by our empirical validation, the development and exploitation of additional data sets is needed, in order to establish an efficient framework for viral population reconstruction using NGS.
Collapse
Affiliation(s)
- Mattia C. F. Prosperi
- University of Manchester, Faculty of Medical and Human Sciences, Northwest Institute of Bio-Health Informatics, Centre for Health Informatics, Institute of Population Health, Manchester, UK
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
| | - Li Yin
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
- Florida Center for AIDS Research, Gainesville, Florida, USA
| | - David J. Nolan
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
| | - Amanda D. Lowe
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
- Florida Center for AIDS Research, Gainesville, Florida, USA
| | - Maureen M. Goodenow
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
- Florida Center for AIDS Research, Gainesville, Florida, USA
| | - Marco Salemi
- University of Florida, College of Medicine, Department of Pathology, Immunology and Laboratory Medicine, Gainesville, Florida, USA
- Florida Center for AIDS Research, Gainesville, Florida, USA
- Emerging Pathogens Institute, Gainesville, Florida, USA
| |
Collapse
|
48
|
McElroy K, Zagordi O, Bull R, Luciani F, Beerenwinkel N. Accurate single nucleotide variant detection in viral populations by combining probabilistic clustering with a statistical test of strand bias. BMC Genomics 2013; 14:501. [PMID: 23879730 PMCID: PMC3848937 DOI: 10.1186/1471-2164-14-501] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2012] [Accepted: 07/15/2013] [Indexed: 11/10/2022] Open
Abstract
Background Deep sequencing is a powerful tool for assessing viral genetic diversity. Such experiments harness the high coverage afforded by next generation sequencing protocols by treating sequencing reads as a population sample. Distinguishing true single nucleotide variants (SNVs) from sequencing errors remains challenging, however. Current protocols are characterised by high false positive rates, with results requiring time consuming manual checking. Results By statistical modelling, we show that if multiple variant sites are considered at once, SNVs can be called reliably from high coverage viral deep sequencing data at frequencies lower than the error rate of the sequencing technology, and that SNV calling accuracy increases as true sequence diversity within a read length increases. We demonstrate these findings on two control data sets, showing that SNV detection is more reliable on a high diversity human immunodeficiency virus sample as compared to a moderate diversity sample of hepatitis C virus. Finally, we show that in situations where probabilistic clustering retains false positive SNVs (for instance due to insufficient sample diversity or systematic errors), applying a strand bias test based on a beta-binomial model of forward read distribution can improve precision, with negligible cost to true positive recall. Conclusions By combining probabilistic clustering (implemented in the program ShoRAH) with a statistical test of strand bias, SNVs may be called from deeply sequenced viral populations with high accuracy.
Collapse
Affiliation(s)
- Kerensa McElroy
- Centre for Marine Bioinnovation and School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.
| | | | | | | | | |
Collapse
|
49
|
Skums P, Mancuso N, Artyomenko A, Tork B, Mandoiu I, Khudyakov Y, Zelikovsky A. Reconstruction of viral population structure from next-generation sequencing data using multicommodity flows. BMC Bioinformatics 2013; 14 Suppl 9:S2. [PMID: 23902469 PMCID: PMC3698000 DOI: 10.1186/1471-2105-14-s9-s2] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Highly mutable RNA viruses exist in infected hosts as heterogeneous populations of genetically close variants known as quasispecies. Next-generation sequencing (NGS) allows for analysing a large number of viral sequences from infected patients, presenting a novel opportunity for studying the structure of a viral population and understanding virus evolution, drug resistance and immune escape. Accurate reconstruction of genetic composition of intra-host viral populations involves assembling the NGS short reads into whole-genome sequences and estimating frequencies of individual viral variants. Although a few approaches were developed for this task, accurate reconstruction of quasispecies populations remains greatly unresolved. RESULTS Two new methods, AmpMCF and ShotMCF, for reconstruction of the whole-genome intra-host viral variants and estimation of their frequencies were developed, based on Multicommodity Flows (MCFs). AmpMCF was designed for NGS reads obtained from individual PCR amplicons and ShotMCF for NGS shotgun reads. While AmpMCF, based on covering formulation, identifies a minimal set of quasispecies explaining all observed reads, ShotMCS, based on packing formulation, engages the maximal number of reads to generate the most probable set of quasispecies. Both methods were evaluated on simulated data in comparison to Maximum Bandwidth and ViSpA, previously developed state-of-the-art algorithms for estimating quasispecies spectra from the NGS amplicon and shotgun reads, respectively. Both algorithms were accurate in estimation of quasispecies frequencies, especially from large datasets. CONCLUSIONS The problem of viral population reconstruction from amplicon or shotgun NGS reads was solved using the MCF formulation. The two methods, ShotMCF and AmpMCF, developed here afford accurate reconstruction of the structure of intra-host viral population from NGS reads. The implementations of the algorithms are available at http://alan.cs.gsu.edu/vira.html (AmpMCF) and http://alan.cs.gsu.edu/NGS/?q=content/shotmcf (ShotMCF).
Collapse
Affiliation(s)
- Pavel Skums
- Laboratory of Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, 1600 Clifton Road NE, 30333 Atlanta, GA, USA
| | - Nicholas Mancuso
- Department of Computer Science, Georgia State University, 34 Peachtree str., 30303, Atlanta, GA, USA
| | - Alexander Artyomenko
- Department of Computer Science, Georgia State University, 34 Peachtree str., 30303, Atlanta, GA, USA
| | - Bassam Tork
- Department of Computer Science, Georgia State University, 34 Peachtree str., 30303, Atlanta, GA, USA
| | - Ion Mandoiu
- Department of Computer Science and Engineering, University of Connecticut, 06269, Storrs, CT, USA
| | - Yury Khudyakov
- Laboratory of Molecular Epidemiology and Bioinformatics, Division of Viral Hepatitis, Centers for Disease Control and Prevention, 1600 Clifton Road NE, 30333 Atlanta, GA, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 34 Peachtree str., 30303, Atlanta, GA, USA
| |
Collapse
|
50
|
Niklas N, Pröll J, Danzer M, Stabentheiner S, Hofer K, Gabriel C. Routine performance and errors of 454 HLA exon sequencing in diagnostics. BMC Bioinformatics 2013; 14:176. [PMID: 23731822 PMCID: PMC3679934 DOI: 10.1186/1471-2105-14-176] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 05/30/2013] [Indexed: 11/25/2022] Open
Abstract
Background Next-generation sequencing (NGS) has changed genomics significantly. More and more applications strive for sequencing with different platforms. Now, in 2012, after a decade of development and evolution, NGS has been accepted for a variety of research fields. Determination of sequencing errors is essential in order to follow next-generation sequencing beyond research use only. This study describes the overall 454 system performance of using multiple GS Junior runs with an in-house established and validated diagnostic assay for human leukocyte antigen (HLA) exon sequencing. Based on this data, we extracted, evaluated and characterized errors and variants of 60 HLA loci per run with respect to their adjacencies. Results We determined an overall error rate of 0.18% in a total of 118,484,408 bases. 31.3% of all reads analyzed (n=349,503) contain one or more errors. The largest group are deletions that account for 50% of the errors. Incorrect bases are not distributed equally along sequences and tend to be more frequent at sequence ends. Certain sequence positions in the middle or at the beginning of the read accumulate errors. Typically, the corresponding quality score at the actual error position is lower than the adjacent scores. Conclusions Here we present the first error assessment in a human next-generation sequencing diagnostics assay in an amplicon sequencing approach. Improvements of sequence quality and error rate that have been made over the years are evident and it is shown that both have now reached a level where diagnostic applications become feasible. Our presented data are better than previously published error rates and we can confirm and quantify the often described relation of homopolymers and errors. Nevertheless, a certain depth of coverage is needed, in particular with challenging areas of the sequencing target. Furthermore, the usage of error correcting tools is not essential but might contribute towards the capacity and efficiency of a sequencing run.
Collapse
Affiliation(s)
- Norbert Niklas
- Red Cross Transfusion Service for Upper Austria, Krankenhausstraße 7, 4017 Linz, Austria.
| | | | | | | | | | | |
Collapse
|