1
|
Song W, Zhang S, Majzoub ME, Egan S, Kjelleberg S, Thomas T. The impact of interspecific competition on the genomic evolution of Phaeobacter inhibens and Pseudoalteromonas tunicata during biofilm growth. Environ Microbiol 2024; 26:e16553. [PMID: 38062568 DOI: 10.1111/1462-2920.16553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 11/24/2023] [Indexed: 01/30/2024]
Abstract
Interspecific interactions in biofilms have been shown to cause the emergence of community-level properties. To understand the impact of interspecific competition on evolution, we deep-sequenced the dispersal population of mono- and co-culture biofilms of two antagonistic marine bacteria (Phaeobacter inhibens 2.10 and Pseudoalteromononas tunicata D2). Enhanced phenotypic and genomic diversification was observed in the P. tunicata D2 populations under both mono- and co-culture biofilms in comparison to P. inhibens 2.10. The genetic variation was exclusively due to single nucleotide variants and small deletions, and showed high variability between replicates, indicating their random emergence. Interspecific competition exerted an apparent strong positive selection on a subset of P. inhibens 2.10 genes (e.g., luxR, cobC, argH, and sinR) that could facilitate competition, while the P. tunicata D2 population was genetically constrained under competition conditions. In the absence of interspecific competition, the P. tunicata D2 replicate populations displayed high levels of mutations affecting the same genes involved in cell motility and biofilm formation. Our results show that interspecific biofilm competition has a complex impact on genomic diversification, which likely depends on the nature of the competing strains and their ability to generate genetic variants due to their genomic constraints.
Collapse
Affiliation(s)
- Weizhi Song
- Centre for Marine Science and Innovation, School of Biological, Earth and Environmental Sciences, Faculty of Science, The University of New South Wales, Kensington, New South Wales, Australia
- School of Biological, Earth and Environmental Sciences, University of New South Wales, Kensington, New South Wales, Australia
| | - Shan Zhang
- Centre for Marine Science and Innovation, School of Biological, Earth and Environmental Sciences, Faculty of Science, The University of New South Wales, Kensington, New South Wales, Australia
- School of Biological, Earth and Environmental Sciences, University of New South Wales, Kensington, New South Wales, Australia
| | - Marwan E Majzoub
- Centre for Marine Science and Innovation, School of Biological, Earth and Environmental Sciences, Faculty of Science, The University of New South Wales, Kensington, New South Wales, Australia
- School of Biological, Earth and Environmental Sciences, University of New South Wales, Kensington, New South Wales, Australia
| | - Suhelen Egan
- Centre for Marine Science and Innovation, School of Biological, Earth and Environmental Sciences, Faculty of Science, The University of New South Wales, Kensington, New South Wales, Australia
- School of Biological, Earth and Environmental Sciences, University of New South Wales, Kensington, New South Wales, Australia
| | - Staffan Kjelleberg
- Centre for Marine Science and Innovation, School of Biological, Earth and Environmental Sciences, Faculty of Science, The University of New South Wales, Kensington, New South Wales, Australia
- School of Biological, Earth and Environmental Sciences, University of New South Wales, Kensington, New South Wales, Australia
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Torsten Thomas
- Centre for Marine Science and Innovation, School of Biological, Earth and Environmental Sciences, Faculty of Science, The University of New South Wales, Kensington, New South Wales, Australia
- School of Biological, Earth and Environmental Sciences, University of New South Wales, Kensington, New South Wales, Australia
| |
Collapse
|
2
|
Fu H, Zhang C, Wang Y, Chen G. Advances in multiplex molecular detection technologies for harmful algae. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:43745-43757. [PMID: 35449333 DOI: 10.1007/s11356-022-20269-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 04/11/2022] [Indexed: 06/14/2023]
Abstract
As the eutrophication of natural water bodies becomes more and more serious, the frequency of outbreaks of harmful algal blooms (HABs) mainly formed by harmful algae also increases. HABs have become a global ecological problem that poses a serious threat to human health and food safety. Therefore, it is extremely important to establish methods that can rapidly detect harmful algal species for early warning of HABs. The traditional morphology-based identification method is inefficient and inaccurate. In recent years, the rapid development of molecular biology techniques has provided new ideas for the detection of harmful algae and has become a research hotspot. The current molecular detection methods for harmful algal species mainly include fluorescence in situ hybridization, sandwich hybridization, and quantitative PCR (qPCR), but all of these methods can only detect single harmful algal species at a time. The establishment of methods for the simultaneous detection of multiple harmful algal species has become a new trend in the development of molecular detection technology because various harmful algal species may coexist in the natural water environment. The established molecular techniques for multiple detections of harmful algae mainly include gene chip, multiplex PCR, multiplex qPCR, massively parallel sequencing, antibody chip, and multiple isothermal amplification. This review mainly focuses on the principles, advantages and disadvantages, application progress, and application prospects of these multiple detection technologies, aiming at providing effective references not only for the fisheries but also for economic activities, environment, and human health.
Collapse
Affiliation(s)
- Hanyu Fu
- College of Oceanology, Harbin Institute of Technology (Weihai), Weihai, 264209, People's Republic of China
| | - Chunyun Zhang
- College of Oceanology, Harbin Institute of Technology (Weihai), Weihai, 264209, People's Republic of China
| | - Yuanyuan Wang
- College of Oceanology, Harbin Institute of Technology (Weihai), Weihai, 264209, People's Republic of China
| | - Guofu Chen
- College of Oceanology, Harbin Institute of Technology (Weihai), Weihai, 264209, People's Republic of China.
- School of Environment, Harbin Institute of Technology, Harbin, 150009, People's Republic of China.
| |
Collapse
|
3
|
Genomic evolution of the marine bacterium Phaeobacter inhibens during biofilm growth. Appl Environ Microbiol 2021; 87:e0076921. [PMID: 34288701 DOI: 10.1128/aem.00769-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
P. inhibens 2.10 is an effective biofilm former on marine surfaces and has the ability to outcompete other microorganisms, possibly due to the production of the plasmid-encoded, secondary metabolite tropodithietic acid (TDA). P. inhibens 2.10 biofilms produce phenotypic variants with reduced competitiveness compared to the wild-type. In the present study, we used longitudinal, genome-wide deep sequencing to uncover the genetic foundation that contributes to the emergent phenotypic diversity in P. inhibens 2.10 biofilm dispersants. Our results show that phenotypic variation is not due to the loss of plasmid that encodes the genes for the TDA synthesis, but instead show that P. inhibens 2.10 biofilm populations become rapidly enriched in single nucleotide variations in genes involved in the synthesis of TDA. While variants in genes previously linked to other phenotypes, such as lipopolysaccharide production (i.e. rfbA) and celluar persistence (i.e. metG), also appear to be selected for during biofilm dispersal, the number and consistency of variations found for genes involved in TDA production suggest that this metabolite imposes a burden for P. inhibens 2.10 cells. Our results indicate a strong selection pressure for the loss of TDA in mono-species biofilm populations and provide insight into how competition (or lack thereof) in biofilms might shape genome evolution in bacteria. Importance Statement Biofilm formation and dispersal are important survival strategies for environmental bacteria. During biofilm dispersal cells often display stable and heritable variants from the parental biofilm. Phaeobacter inhibens is an effective colonizer of marine surfaces, in which a subpopulation of its biofilm dispersal cells displays a non-competitive phenotype. This study aimed to elucidate the genetic basis of these phenotypic changes. Despite the progress made to date in characterizing the dispersal variants in P. inhibens, little is understood about the underlying genetic changes that result in the development of the specific variants. Here, P. inhibens phenotypic variation was linked to single nucleotide polymorphisms (SNPs), in particular in genes affecting the competitive ability of P. inhibens, including genes related to the production of the antibiotic tropodithietic acid (TDA) and bacterial cell-cell communication (e.g. quorum sensing). This work is significant as it reveals how the biofilm-lifestyle might shape genome evolution in a cosmopolitan bacterium.
Collapse
|
4
|
Lopatkin AJ, Bening SC, Manson AL, Stokes JM, Kohanski MA, Badran AH, Earl AM, Cheney NJ, Yang JH, Collins JJ. Clinically relevant mutations in core metabolic genes confer antibiotic resistance. Science 2021; 371:371/6531/eaba0862. [PMID: 33602825 DOI: 10.1126/science.aba0862] [Citation(s) in RCA: 157] [Impact Index Per Article: 52.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 09/16/2020] [Accepted: 12/18/2020] [Indexed: 12/17/2022]
Abstract
Although metabolism plays an active role in antibiotic lethality, antibiotic resistance is generally associated with drug target modification, enzymatic inactivation, and/or transport rather than metabolic processes. Evolution experiments of Escherichia coli rely on growth-dependent selection, which may provide a limited view of the antibiotic resistance landscape. We sequenced and analyzed E. coli adapted to representative antibiotics at increasingly heightened metabolic states. This revealed various underappreciated noncanonical genes, such as those related to central carbon and energy metabolism, which are implicated in antibiotic resistance. These metabolic alterations lead to lower basal respiration, which prevents antibiotic-mediated induction of tricarboxylic acid cycle activity, thus avoiding metabolic toxicity and minimizing drug lethality. Several of the identified metabolism-specific mutations are overrepresented in the genomes of >3500 clinical E. coli pathogens, indicating clinical relevance.
Collapse
Affiliation(s)
- Allison J Lopatkin
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA.,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Wyss Institute for Biologically Inspired Engineering; Harvard University, Boston, MA, USA.,Department of Biology, Barnard College, New York, NY, USA.,Data Science Institute, Columbia University, New York, NY, USA.,Ecology, Evolution, and Environmental Biology, Columbia University, New York, NY, USA
| | - Sarah C Bening
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA.,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Abigail L Manson
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jonathan M Stokes
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA.,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Wyss Institute for Biologically Inspired Engineering; Harvard University, Boston, MA, USA
| | - Michael A Kohanski
- Department of Otorhinolaryngology-Head and Neck Surgery, Division of Rhinology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ahmed H Badran
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ashlee M Earl
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicole J Cheney
- Ruy V. Lourenço Center for Emerging and Re-Emerging Pathogens, Rutgers New Jersey Medical School, Newark, NJ, USA.,Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, NJ, USA
| | - Jason H Yang
- Ruy V. Lourenço Center for Emerging and Re-Emerging Pathogens, Rutgers New Jersey Medical School, Newark, NJ, USA.,Department of Microbiology, Biochemistry and Molecular Genetics, Rutgers New Jersey Medical School, Newark, NJ, USA
| | - James J Collins
- Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA. .,Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Wyss Institute for Biologically Inspired Engineering; Harvard University, Boston, MA, USA.,Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA.,Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, USA
| |
Collapse
|
5
|
Laugel E, Hartard C, Jeulin H, Berger S, Venard V, Bronowicki JP, Schvoerer E. Full-length genome sequencing of RNA viruses-How the approach can enlighten us on hepatitis C and hepatitis E viruses. Rev Med Virol 2020; 31:e2197. [PMID: 34260779 DOI: 10.1002/rmv.2197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/09/2022]
Abstract
Among the five main viruses responsible for human hepatitis, hepatitis C virus (HCV) and hepatitis E virus (HEV) are different while sharing similarities. Both viruses can be transmitted by blood or derivatives whereas HEV can also follow environmental or zoonotic routes. These highly variable RNA viruses can cause chronic hepatitis potentially leading to hepatocarcinoma. HCV and HEV can develop new structures and functions under selective pressure to adapt to host immunity, human tissues, treatments or even various animal reservoirs. Elsewhere, with directly acting antiviral treatments, HCV can be eradicated whereas HEV is an emerging pathogen against which specific treatments have to be improved. As a unique molecular tool able to explore viral genomic plasticity, full-length genome (FLG) sequencing has become easier, faster and cheaper. The present review will show how FLG sequencing can explore these RNA viruses with the aim to investigate key genomics data to improve basic knowledge, patients' healthcare and preventive tools.
Collapse
Affiliation(s)
- Elodie Laugel
- Université de Lorraine, Vandœuvre-lès-Nancy, France.,Laboratoire de Virologie, CHRU de Nancy Brabois, Vandœuvre-lès-Nancy, France.,Laboratoire de Chimie Physique et Microbiologie pour les Matériaux et l'Environnement (LCPME), UMR 7564 CNRS-UL, Vandœuvre-lès-Nancy, France
| | - Cédric Hartard
- Université de Lorraine, Vandœuvre-lès-Nancy, France.,Laboratoire de Virologie, CHRU de Nancy Brabois, Vandœuvre-lès-Nancy, France.,Laboratoire de Chimie Physique et Microbiologie pour les Matériaux et l'Environnement (LCPME), UMR 7564 CNRS-UL, Vandœuvre-lès-Nancy, France
| | - Hélène Jeulin
- Université de Lorraine, Vandœuvre-lès-Nancy, France.,Laboratoire de Virologie, CHRU de Nancy Brabois, Vandœuvre-lès-Nancy, France.,Laboratoire de Chimie Physique et Microbiologie pour les Matériaux et l'Environnement (LCPME), UMR 7564 CNRS-UL, Vandœuvre-lès-Nancy, France
| | - Sibel Berger
- Laboratoire de Virologie, CHRU de Nancy Brabois, Vandœuvre-lès-Nancy, France
| | - Véronique Venard
- Université de Lorraine, Vandœuvre-lès-Nancy, France.,Laboratoire de Virologie, CHRU de Nancy Brabois, Vandœuvre-lès-Nancy, France
| | - Jean-Pierre Bronowicki
- Université de Lorraine, Vandœuvre-lès-Nancy, France.,Service d'hépato-gastroentérologie, CHRU de Nancy Brabois, Vandœuvre-lès-Nancy, France
| | - Evelyne Schvoerer
- Université de Lorraine, Vandœuvre-lès-Nancy, France.,Laboratoire de Virologie, CHRU de Nancy Brabois, Vandœuvre-lès-Nancy, France.,Laboratoire de Chimie Physique et Microbiologie pour les Matériaux et l'Environnement (LCPME), UMR 7564 CNRS-UL, Vandœuvre-lès-Nancy, France
| |
Collapse
|
6
|
Turakhia Y, De Maio N, Thornlow B, Gozashti L, Lanfear R, Walker CR, Hinrichs AS, Fernandes JD, Borges R, Slodkowicz G, Weilguny L, Haussler D, Goldman N, Corbett-Detig R. Stability of SARS-CoV-2 phylogenies. PLoS Genet 2020; 16:e1009175. [PMID: 33206635 PMCID: PMC7721162 DOI: 10.1371/journal.pgen.1009175] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 12/07/2020] [Accepted: 10/06/2020] [Indexed: 12/23/2022] Open
Abstract
The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.
Collapse
Affiliation(s)
- Yatish Turakhia
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Bryan Thornlow
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| | - Landen Gozashti
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, United States of America
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Conor R. Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Angie S. Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| | - Jason D. Fernandes
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, United States of America
| | - Rui Borges
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| | - Greg Slodkowicz
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Lukas Weilguny
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - David Haussler
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA, United States of America
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, United States of America
| |
Collapse
|
7
|
External Quality Assessment for Next-Generation Sequencing-Based HIV Drug Resistance Testing: Unique Requirements and Challenges. Viruses 2020; 12:v12050550. [PMID: 32429382 PMCID: PMC7291216 DOI: 10.3390/v12050550] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 05/09/2020] [Accepted: 05/14/2020] [Indexed: 12/25/2022] Open
Abstract
Over the past decade, there has been an increase in the adoption of next generation sequencing (NGS) technologies for HIV drug resistance (HIVDR) testing. NGS far outweighs conventional Sanger sequencing as it has much higher throughput, lower cost when samples are batched and, most importantly, significantly higher sensitivities for variants present at low frequencies, which may have significant clinical implications. Despite the advantages of NGS, Sanger sequencing remains the gold standard for HIVDR testing, largely due to the lack of standardization of NGS-based HIVDR testing. One important aspect of standardization includes external quality assessment (EQA) strategies and programs. Current EQA for Sanger-based HIVDR testing includes proficiency testing where samples are sent to labs and the performance of the lab conducting such assays is evaluated. The current methods for Sanger-based EQA may not apply to NGS-based tests because of the fundamental differences in their technologies and outputs. Sanger-based genotyping reports drug resistance mutations (DRMs) data as dichotomous, whereas NGS-based HIVDR genotyping also reports DRMs as numerical data (percent abundance). Here we present an overview of the need to develop EQA for NGS-based HIVDR testing and some unique challenges that may be encountered.
Collapse
|
8
|
McElhoe JA, Holland MM. Characterization of background noise in MiSeq MPS data when sequencing human mitochondrial DNA from various sample sources and library preparation methods. Mitochondrion 2020; 52:40-55. [PMID: 32068127 DOI: 10.1016/j.mito.2020.02.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 12/18/2019] [Accepted: 02/12/2020] [Indexed: 12/20/2022]
Abstract
Improved resolution of massively parallel sequencing (MPS) allows for the characterization of mitochondrial (mt) DNA heteroplasmy to levels previously unattainable with traditional sequencing approaches. An essential criterion for the reporting of heteroplasmy is the ability of the MPS method to distinguish minor sequence variants (MSVs) from system noise, or error. Therefore, an assessment of the background noise in the MPS method is desirable to identify the point at which reliable data can be reported. Substitution and sequence specific error (SSE) was evaluated for a variety of sample types and two library preparations. Substitution error rates ranged from 0.18 to 0.49 per 100 nucleotides with C positions generally having the highest rate of misincorporation. Comparison of error rates across sample types indicated a significant increase for samples with damaged DNA. The positions of error were varied across datasets (pairwise concordance 0-68%), but had greater consistency within the damaged samples (80-96%). The most commonly observed motif preceding error in forward reads was CCG, while GGT was most common in reverse reads, both consistent with previous findings. The findings illustrate that for datasets containing samples with damaged DNA, reporting thresholds for heteroplasmy may have to be modified and individual sites with error levels exceeding thresholds should be scrutinized. Collectively, the shifting error profiles observed across the various sample types and library preparation methods demonstrates the need for an assessment of error under these varying circumstances. Characterization of the applicable background noise will help to ensure that thresholds are reliably set for detection of true MSVs.
Collapse
Affiliation(s)
- Jennifer A McElhoe
- Department of Biochemistry & Molecular Biology, Forensic Science Program, The Pennsylvania State University, University Park, PA 16802, USA.
| | - Mitchell M Holland
- Department of Biochemistry & Molecular Biology, Forensic Science Program, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
9
|
Patel V, Spouge JL. Estimating the basic reproduction number of a pathogen in a single host when only a single founder successfully infects. PLoS One 2020; 15:e0227127. [PMID: 31923263 PMCID: PMC6953795 DOI: 10.1371/journal.pone.0227127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Accepted: 12/12/2019] [Indexed: 11/27/2022] Open
Abstract
If viruses or other pathogens infect a single host, the outcome of infection may depend on the initial basic reproduction number R0, the expected number of host cells infected by a single infected cell. This article shows that sometimes, phylogenetic models can estimate the initial R0, using only sequences sampled from the pathogenic population during its exponential growth or shortly thereafter. When evaluated by simulations mimicking the bursting viral reproduction of HIV and simultaneous sampling of HIV gp120 sequences during early viremia, the estimated R0 displayed useful accuracies in achievable experimental designs. Estimates of R0 have several potential applications to investigators interested in the progress of infection in single hosts, including: (1) timing a pathogen’s movement through different microenvironments; (2) timing the change points in a pathogen’s mode of spread (e.g., timing the change from cell-free spread to cell-to-cell spread, or vice versa, in an HIV infection); (3) quantifying the impact different initial microenvironments have on pathogens (e.g., in mucosal challenge with HIV, quantifying the impact that the presence or absence of mucosal infection has on R0); (4) quantifying subtle changes in infectability in therapeutic trials (either human or animal), even when therapies do not produce total sterilizing immunity; and (5) providing a variable predictive of the clinical efficacy of prophylactic therapies.
Collapse
Affiliation(s)
- Vruj Patel
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - John L. Spouge
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
10
|
Chen J, Shang J, Wang J, Sun Y. A binning tool to reconstruct viral haplotypes from assembled contigs. BMC Bioinformatics 2019; 20:544. [PMID: 31684876 PMCID: PMC6829986 DOI: 10.1186/s12859-019-3138-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 10/09/2019] [Indexed: 11/21/2022] Open
Abstract
Background Infections by RNA viruses such as Influenza, HIV still pose a serious threat to human health despite extensive research on viral diseases. One challenge for producing effective prevention and treatment strategies is high intra-species genetic diversity. As different strains may have different biological properties, characterizing the genetic diversity is thus important to vaccine and drug design. Next-generation sequencing technology enables comprehensive characterization of both known and novel strains and has been widely adopted for sequencing viral populations. However, genome-scale reconstruction of haplotypes is still a challenging problem. In particular, haplotype assembly programs often produce contigs rather than full genomes. As a mutation in one gene can mask the phenotypic effects of a mutation at another locus, clustering these contigs into genome-scale haplotypes is still needed. Results We developed a contig binning tool, VirBin, which clusters contigs into different groups so that each group represents a haplotype. Commonly used features based on sequence composition and contig coverage cannot effectively distinguish viral haplotypes because of their high sequence similarity and heterogeneous sequencing coverage for RNA viruses. VirBin applied prototype-based clustering to cluster regions that are more likely to contain mutations specific to a haplotype. The tool was tested on multiple simulated sequencing data with different haplotype abundance distributions and contig sizes, and also on mock quasispecies sequencing data. The benchmark results with other contig binning tools demonstrated the superior sensitivity and precision of VirBin in contig binning for viral haplotype reconstruction. Conclusions In this work, we presented VirBin, a new contig binning tool for distinguishing contigs from different viral haplotypes with high sequence similarity. It competes favorably with other tools on viral contig binning. The source codes are available at: https://github.com/chjiao/VirBin.
Collapse
Affiliation(s)
- Jiao Chen
- Computer Science and Engineering, Michigan State University, East Lansing, 48824, USA
| | - Jiayu Shang
- Electrical Engineering, City University of Hong Kong, Hong Kong, China
| | - Jianrong Wang
- Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, 48824, USA
| | - Yanni Sun
- Electrical Engineering, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|
11
|
Henningsson R, Moratorio G, Bordería AV, Vignuzzi M, Fontes M. DISSEQT-DIStribution-based modeling of SEQuence space Time dynamics. Virus Evol 2019; 5:vez028. [PMID: 31392032 PMCID: PMC6680062 DOI: 10.1093/ve/vez028] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Rapidly evolving microbes are a challenge to model because of the volatile, complex, and dynamic nature of their populations. We developed the DISSEQT pipeline (DIStribution-based SEQuence space Time dynamics) for analyzing, visualizing, and predicting the evolution of heterogeneous biological populations in multidimensional genetic space, suited for population-based modeling of deep sequencing and high-throughput data. The pipeline is openly available on GitHub (https://github.com/rasmushenningsson/DISSEQT.jl, accessed 23 June 2019) and Synapse (https://www.synapse.org/#!Synapse: syn11425758, accessed 23 June 2019), covering the entire workflow from read alignment to visualization of results. Our pipeline is centered around robust dimension and model reduction algorithms for analysis of genotypic data with additional capabilities for including phenotypic features to explore dynamic genotype-phenotype maps. We illustrate its utility and capacity with examples from evolving RNA virus populations, which present one of the highest degrees of genetic heterogeneity within a given population found in nature. Using our pipeline, we empirically reconstruct the evolutionary trajectories of evolving populations in sequence space and genotype-phenotype fitness landscapes. We show that while sequence space is vastly multidimensional, the relevant genetic space of evolving microbial populations is of intrinsically low dimension. In addition, evolutionary trajectories of these populations can be faithfully monitored to identify the key minority genotypes contributing most to evolution. Finally, we show that empirical fitness landscapes, when reconstructed to include minority variants, can predict phenotype from genotype with high accuracy.
Collapse
Affiliation(s)
- R Henningsson
- The Centre for Mathematical Sciences, Lund University, Sweden
- Viral Populations and Pathogenesis Unit, Institut Pasteur, Paris, France
- The International Group for Data Analysis, Institut Pasteur, Paris, France
- Division of Clinical Genetics, Lund University, Sweden
| | - G Moratorio
- Viral Populations and Pathogenesis Unit, Institut Pasteur, Paris, France
- Laboratorio de Virología Molecular, Universidad de la República, Montevideo, Uruguay
| | - A V Bordería
- The International Group for Data Analysis, Institut Pasteur, Paris, France
| | - M Vignuzzi
- Viral Populations and Pathogenesis Unit, Institut Pasteur, Paris, France
| | - M Fontes
- The International Group for Data Analysis, Institut Pasteur, Paris, France
- Department of Cancer Immunology, Genentech, South San Francisco, CA, USA
- The Center for Genomic Medicine, Rigshospitalet, Copenhagen, Denmark
- Persimune, The Centre of Excellence for Personalized Medicine, Copenhagen, Denmark
| |
Collapse
|
12
|
Seiler E, Trappe K, Renard BY. Where did you come from, where did you go: Refining metagenomic analysis tools for horizontal gene transfer characterisation. PLoS Comput Biol 2019; 15:e1007208. [PMID: 31335917 PMCID: PMC6677323 DOI: 10.1371/journal.pcbi.1007208] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 08/02/2019] [Accepted: 06/24/2019] [Indexed: 12/22/2022] Open
Abstract
Horizontal gene transfer (HGT) has changed the way we regard evolution. Instead of waiting for the next generation to establish new traits, especially bacteria are able to take a shortcut via HGT that enables them to pass on genes from one individual to another, even across species boundaries. The tool Daisy offers the first HGT detection approach based on read mapping that provides complementary evidence compared to existing methods. However, Daisy relies on the acceptor and donor organism involved in the HGT being known. We introduce DaisyGPS, a mapping-based pipeline that is able to identify acceptor and donor reference candidates of an HGT event based on sequencing reads. Acceptor and donor identification is akin to species identification in metagenomic samples based on sequencing reads, a problem addressed by metagenomic profiling tools. However, acceptor and donor references have certain properties such that these methods cannot be directly applied. DaisyGPS uses MicrobeGPS, a metagenomic profiling tool tailored towards estimating the genomic distance between organisms in the sample and the reference database. We enhance the underlying scoring system of MicrobeGPS to account for the sequence patterns in terms of mapping coverage of an acceptor and donor involved in an HGT event, and report a ranked list of reference candidates. These candidates can then be further evaluated by tools like Daisy to establish HGT regions. We successfully validated our approach on both simulated and real data, and show its benefits in an investigation of an outbreak involving Methicillin-resistant Staphylococcus aureus data.
Collapse
Affiliation(s)
- Enrico Seiler
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, and Algorithmic Bioinformatics, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
13
|
Aeschlimann SH, Graf C, Mayilo D, Lindecker H, Urda L, Kappes N, Burr AL, Simonis M, Splinter E, Min M, Laux H. Enhanced CHO Clone Screening: Application of Targeted Locus Amplification and Next‐Generation Sequencing Technologies for Cell Line Development. Biotechnol J 2019; 14:e1800371. [DOI: 10.1002/biot.201800371] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 12/20/2018] [Indexed: 12/20/2022]
Affiliation(s)
- Samuel H. Aeschlimann
- Novartis Institutes for BioMedical Research, Integrated Biologics Profiling UnitCH‐4002 Basel Switzerland
| | - Christian Graf
- Novartis Technical R&D, Technical Development BiosimilarsHexal AG, Keltenring 1+3 82041 Oberhaching Germany
| | - Dmytro Mayilo
- Novartis Institutes for BioMedical Research, Integrated Biologics Profiling UnitCH‐4002 Basel Switzerland
| | - Hélène Lindecker
- Novartis Institutes for BioMedical Research, Integrated Biologics Profiling UnitCH‐4002 Basel Switzerland
| | - Lorena Urda
- Novartis Institutes for BioMedical Research, Integrated Biologics Profiling UnitCH‐4002 Basel Switzerland
| | - Nora Kappes
- Novartis Institutes for BioMedical Research, Integrated Biologics Profiling UnitCH‐4002 Basel Switzerland
| | - Alicia Leone Burr
- Novartis Institutes for BioMedical Research, Integrated Biologics Profiling UnitCH‐4002 Basel Switzerland
| | | | - Erik Splinter
- Cergentis B.VYalelaan 62 3584 CM Utrecht The Netherlands
| | - Max Min
- Cergentis B.VYalelaan 62 3584 CM Utrecht The Netherlands
| | - Holger Laux
- Novartis Institutes for BioMedical Research, Integrated Biologics Profiling UnitCH‐4002 Basel Switzerland
| |
Collapse
|
14
|
Liao KH, Hon WK, Tang CY, Hsieh WP. MetaSMC: a coalescent-based shotgun sequence simulator for evolving microbial populations. Bioinformatics 2019; 35:1677-1685. [PMID: 30321266 DOI: 10.1093/bioinformatics/bty840] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 09/09/2018] [Accepted: 10/11/2018] [Indexed: 01/26/2023] Open
Abstract
MOTIVATION High-throughput sequencing technology has revolutionized the study of metagenomics and cancer evolution. In a relatively simple environment, a metagenomics sequencing data is dominated by a few species. By analyzing the alignment of reads from microbial species, single nucleotide polymorphisms can be discovered and the evolutionary history of the populations can be reconstructed. The ever-increasing read length will allow more detailed analysis about the evolutionary history of microbial or tumor cell population. A simulator of shotgun sequences from such populations will be helpful in the development or evaluation of analysis algorithms. RESULTS Here, we described an efficient algorithm, MetaSMC, which simulates reads from evolving microbial populations. Based on the coalescent theory, our simulator supports all evolutionary scenarios supported by other coalescent simulators. In addition, the simulator supports various substitution models, including Jukes-Cantor, HKY85 and generalized time-reversible models. The simulator also supports mutator phenotypes by allowing different mutation rates and substitution models in different subpopulations. Our algorithm ignores unnecessary chromosomal segments and thus is more efficient than standard coalescent when recombination is frequent. We showed that the process behind our algorithm is equivalent to Sequentially Markov Coalescent with an incomplete sample. The accuracy of our algorithm was evaluated by summary statistics and likelihood curves derived from Monte Carlo integration over large number of random genealogies. AVAILABILITY AND IMPLEMENTATION MetaSMC is written in C. The source code is available at https://github.com/tarjxvf/metasmc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ki-Hok Liao
- Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan
| | - Wing-Kai Hon
- Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan
| | - Chuan-Yi Tang
- Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan.,Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
| | - Wen-Ping Hsieh
- Institute of Statistics, National Tsing-Hua University, Hsinchu, Taiwan
| |
Collapse
|
15
|
Olmstead AD, Montoya V, Chui CK, Dong W, Joy JB, Tai V, Poon AFY, Nguyen T, Brumme CJ, Martinello M, Matthews GV, Richard Harrigan P, Dore GJ, Applegate TL, Grebely J, Howe AYM. A systematic, deep sequencing-based methodology for identification of mixed-genotype hepatitis C virus infections. INFECTION GENETICS AND EVOLUTION 2019; 69:76-84. [PMID: 30654177 DOI: 10.1016/j.meegid.2019.01.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 01/10/2019] [Accepted: 01/13/2019] [Indexed: 02/06/2023]
Abstract
Hepatitis C virus (HCV) mixed genotype infections can affect treatment outcomes and may have implications for vaccine design and disease progression. Previous studies demonstrate 0-39% of high-risk, HCV-infected individuals harbor mixed genotypes however standardized, sensitive methods of detection are lacking. This study compared PCR amplicon, random primer (RP), and probe enrichment (PE)-based deep sequencing methods coupled with a custom sequence analysis pipeline to detect multiple HCV genotypes. Mixed infection cutoff values, based on HCV read depth and coverage, were identified using receiver operating characteristic curve analysis. The methodology was validated using artificially mixed genotype samples and then applied to two clinical trials of HCV treatment in high-risk individuals (ACTIVATE, 114 samples from 90 individuals; DARE-C II, 26 samples from 18 individuals) and a cohort of HIV/HCV co-infected individuals (Canadian Coinfection Cohort (CCC), 3 samples from 2 individuals with suspected mixed genotype infections). Amplification bias of genotype (G)1b, G2, G3 and G5 was observed in artificially mixed samples using the PCR method while no genotype bias was observed using RP and PE. RP and PE sequencing of 140 ACTIVATE and DARE-C II samples identified the following primary genotypes: 15% (n = 21) G1a, 76% (n = 106) G3, and 9% (n = 13) G2. Sequencing of ACTIVATE and DARE-C II demonstrated, on average, 2% and 1% of HCV reads mapping to a second genotype using RP and PE, respectively, however none passed the mixed infection cutoff criteria and phylogenetics confirmed no mixed infections. From CCC, one mixed infection was confirmed while the other was determined to be a recombinant genotype. This study underlines the risk for false identification of mixed HCV infections and stresses the need for standardized methods to improve prevalence estimates and to understand the impact of mixed infections for management and elimination of HCV.
Collapse
Affiliation(s)
| | | | - Celia K Chui
- BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
| | - Winnie Dong
- BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
| | - Jeffrey B Joy
- BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada; Faculty of Medicine, Department of Medicine, Division of AIDS, University of British Columbia, Vancouver, BC, Canada
| | - Vera Tai
- BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
| | - Art F Y Poon
- BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
| | - Thuy Nguyen
- BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
| | | | | | | | - P Richard Harrigan
- Faculty of Medicine, Department of Medicine, Division of AIDS, University of British Columbia, Vancouver, BC, Canada
| | - Gregory J Dore
- UNSW Sydney, The Kirby Institute, Sydney, NSW, Australia
| | | | - Jason Grebely
- UNSW Sydney, The Kirby Institute, Sydney, NSW, Australia
| | - Anita Y M Howe
- BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
| |
Collapse
|
16
|
Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons. PLoS Comput Biol 2018; 14:e1006498. [PMID: 30543621 PMCID: PMC6314628 DOI: 10.1371/journal.pcbi.1006498] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 01/02/2019] [Accepted: 09/10/2018] [Indexed: 01/07/2023] Open
Abstract
Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018. Viral populations constantly evolve and diversify. In this article we introduce a method, FLEA, for reconstructing and visualizing the details of evolutionary changes. FLEA specifically processes data from sequencing platforms that generate reads that are long, but error-prone. To study the evolutionary dynamics of entire genes during viral infection, data is collected via long-read sequencing at discrete time points, allowing us to understand how the virus changes over time. However, the experimental and sequencing process is imperfect, so the resulting data contain not only real evolutionary changes, but also mutations and other genetic artifacts caused by sequencing errors. Our method corrects most of these errors by combining thousands of erroneous sequences into a much smaller number of unique consensus sequences that represent biologically meaningful variation. The resulting high-quality sequences are used for further analysis, such as building an evolutionary tree that tracks and interprets the genetic changes in the viral population over time. FLEA is open source, and is freely available online.
Collapse
|
17
|
Low-Bias RNA Sequencing of the HIV-2 Genome from Blood Plasma. J Virol 2018; 93:JVI.00677-18. [PMID: 30333167 PMCID: PMC6288329 DOI: 10.1128/jvi.00677-18] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 09/14/2018] [Indexed: 11/20/2022] Open
Abstract
Accurate determination of the genetic diversity present in the HIV quasispecies is critical for the development of a preventative vaccine: in particular, little is known about viral genetic diversity for the second type of HIV, HIV-2. A better understanding of HIV-2 biology is relevant to the HIV vaccine field because a substantial proportion of infected people experience long-term viral control, and prior HIV-2 infection has been associated with slower HIV-1 disease progression in coinfected subjects. The majority of traditional and next-generation sequencing methods have relied on target amplification prior to sequencing, introducing biases that may obscure the true signals of diversity in the viral population. Additionally, target enrichment through PCR requires a priori sequence knowledge, which is lacking for HIV-2. Therefore, a target enrichment free method of library preparation would be valuable for the field. We applied an RNA shotgun sequencing (RNA-Seq) method without PCR amplification to cultured viral stocks and patient plasma samples from HIV-2-infected individuals. Libraries generated from total plasma RNA were analyzed with a two-step pipeline: (i) de novo genome assembly, followed by (ii) read remapping. By this approach, whole-genome sequences were generated with a 28× to 67× mean depth of coverage. Assembled reads showed a low level of GC bias, and comparison of the genome diversities at the intrahost level showed low diversity in the accessory gene vpx in all patients. Our study demonstrates that RNA-Seq is a feasible full-genome de novo sequencing method for blood plasma samples collected from HIV-2-infected individuals.IMPORTANCE An accurate picture of viral genetic diversity is critical for the development of a globally effective HIV vaccine. However, sequencing strategies are often complicated by target enrichment prior to sequencing, introducing biases that can distort variant frequencies, which are not easily corrected for in downstream analyses. Additionally, detailed a priori sequence knowledge is needed to inform robust primer design when employing PCR amplification, a factor that is often lacking when working with tropical diseases localized in developing countries. Previous work has demonstrated that direct RNA shotgun sequencing (RNA-Seq) can be used to circumvent these issues for hepatitis C virus (HCV) and norovirus. We applied RNA-Seq to total RNA extracted from HIV-2 blood plasma samples, demonstrating the applicability of this technique to HIV-2 and allowing us to generate a dynamic picture of genetic diversity over the whole genome of HIV-2 in the context of low-bias sequencing.
Collapse
|
18
|
Ji H, Enns E, Brumme CJ, Parkin N, Howison M, Lee ER, Capina R, Marinier E, Avila‐Rios S, Sandstrom P, Van Domselaar G, Harrigan R, Paredes R, Kantor R, Noguera‐Julian M. Bioinformatic data processing pipelines in support of next-generation sequencing-based HIV drug resistance testing: the Winnipeg Consensus. J Int AIDS Soc 2018; 21:e25193. [PMID: 30350345 PMCID: PMC6198166 DOI: 10.1002/jia2.25193] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Accepted: 09/26/2018] [Indexed: 11/14/2022] Open
Abstract
INTRODUCTION Next-generation sequencing (NGS) has several advantages over conventional Sanger sequencing for HIV drug resistance (HIVDR) genotyping, including detection and quantitation of low-abundance variants bearing drug resistance mutations (DRMs). However, the high HIV genomic diversity, unprecedented large volume of data, complexity of analysis and potential for error pose significant challenges for data processing. Several NGS analysis pipelines have been developed and used in HIVDR research; however, the absence of uniformity in data processing strategies results in lack of consistency and comparability of outputs from different pipelines. To fill this gap, an international symposium on bioinformatic strategies for NGS-based HIVDR testing was held in February 2018 in Winnipeg, Canada, convening laboratory scientists, bioinformaticians and clinicians involved in four recently developed, publicly available NGS HIVDR pipelines. The goal of this symposium was to establish a consensus on effective bioinformatic strategies for NGS data management and its use for HIVDR reporting. DISCUSSION Essential functionalities of an NGS HIVDR pipeline were divided into five analytic blocks: (1) NGS read quality control (QC)/quality assurance (QA); (2) NGS read alignment and reference mapping; (3) HIV variant calling and variant QC; (4) NGS HIVDR reporting; and (5) extended data applications and additional considerations for data management. The consensuses reached among the participants on all major aspects of these blocks are summarized here. They encompass not only recommended data management and analysis strategies, but also detailed bioinformatic approaches that help ensure accuracy of the derived HIVDR analysis outputs for both research and potential clinical use. CONCLUSIONS While NGS is being adopted more broadly in HIVDR testing laboratories, data processing is often a bottleneck hindering its generalized application. The proposed standardization of NGS read QC/QA, read alignment and reference mapping, variant calling and QC, HIVDR reporting and relevant data management strategies in this "Winnipeg Consensus" may serve as a starting guideline for NGS HIVDR data processing that informs the refinement of existing pipelines and those yet to be developed. Moreover, the bioinformatic strategies presented here may apply more broadly to NGS data analysis of microbes harbouring significant genomic diversity.
Collapse
Affiliation(s)
- Hezhao Ji
- National HIV and Retrovirology Laboratories at JC Wilt Infectious Diseases Research CentrePublic Health Agency of CanadaWinnipegMBCanada
- Department of Medical Microbiology and Infectious DiseasesUniversity of ManitobaWinnipegMBCanada
| | - Eric Enns
- Bioinformatics Core at the National Microbiology LaboratoryPublic Health Agency of CanadaWinnipegMBCanada
| | | | | | - Mark Howison
- Watson Institute for International and Public AffairsBrown UniversityProvidenceRIUSA
| | - Emma R. Lee
- National HIV and Retrovirology Laboratories at JC Wilt Infectious Diseases Research CentrePublic Health Agency of CanadaWinnipegMBCanada
| | - Rupert Capina
- National HIV and Retrovirology Laboratories at JC Wilt Infectious Diseases Research CentrePublic Health Agency of CanadaWinnipegMBCanada
| | - Eric Marinier
- Bioinformatics Core at the National Microbiology LaboratoryPublic Health Agency of CanadaWinnipegMBCanada
| | - Santiago Avila‐Rios
- Centre for Research in Infectious DiseasesNational Institute of Respiratory DiseasesMexico CityMexico
| | - Paul Sandstrom
- National HIV and Retrovirology Laboratories at JC Wilt Infectious Diseases Research CentrePublic Health Agency of CanadaWinnipegMBCanada
- Department of Medical Microbiology and Infectious DiseasesUniversity of ManitobaWinnipegMBCanada
| | - Gary Van Domselaar
- Department of Medical Microbiology and Infectious DiseasesUniversity of ManitobaWinnipegMBCanada
- Bioinformatics Core at the National Microbiology LaboratoryPublic Health Agency of CanadaWinnipegMBCanada
| | - Richard Harrigan
- Division of AIDSDepartment of MedicineUniversity of British ColumbiaVancouverBCCanada
| | - Roger Paredes
- IrsiCaixa AIDS Research InstituteBadalonaCataloniaSpain
| | - Rami Kantor
- Division of Infectious DiseasesBrown University Alpert Medical SchoolProvidenceRIUSA
| | | |
Collapse
|
19
|
Brese RL, Gonzalez-Perez MP, Koch M, O'Connell O, Luzuriaga K, Somasundaran M, Clapham PR, Dollar JJ, Nolan DJ, Rose R, Lamers SL. Ultradeep single-molecule real-time sequencing of HIV envelope reveals complete compartmentalization of highly macrophage-tropic R5 proviral variants in brain and CXCR4-using variants in immune and peripheral tissues. J Neurovirol 2018; 24:439-453. [PMID: 29687407 PMCID: PMC7281851 DOI: 10.1007/s13365-018-0633-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 02/28/2018] [Accepted: 03/19/2018] [Indexed: 01/07/2023]
Abstract
Despite combined antiretroviral therapy (cART), HIV+ patients still develop neurological disorders, which may be due to persistent HIV infection and selective evolution in brain tissues. Single-molecule real-time (SMRT) sequencing technology offers an improved opportunity to study the relationship among HIV isolates in the brain and lymphoid tissues because it is capable of generating thousands of long sequence reads in a single run. Here, we used SMRT sequencing to generate ~ 50,000 high-quality full-length HIV envelope sequences (> 2200 bp) from seven autopsy tissues from an HIV+/cART+ subject, including three brain and four non-brain sites. Sanger sequencing was used for comparison with SMRT data and to clone functional pseudoviruses for in vitro tropism assays. Phylogenetic analysis demonstrated that brain-derived HIV was compartmentalized from HIV outside the brain and that the variants from each of the three brain tissues grouped independently. Variants from all peripheral tissues were intermixed on the tree but independent of the brain clades. Due to the large number of sequences, a clustering analysis at three similarity thresholds (99, 99.5, and 99.9%) was also performed. All brain sequences clustered exclusive of any non-brain sequences at all thresholds; however, frontal lobe sequences clustered independently of occipital and parietal lobes. Translated sequences revealed potentially functional differences between brain and non-brain sequences in the location of putative N-linked glycosylation sites (N-sites), V1 length, V3 charge, and the number of V4 N-sites. All brain sequences were predicted to use the CCR5 co-receptor, while most non-brain sequences were predicted to use CXCR4 co-receptor. Tropism results were confirmed by in vitro infection assays. The study is the first to use a SMRT sequencing approach to study HIV compartmentalization in tissues and supports other reports of limited trafficking between brain and non-brain sequences during cART. Due to the long sequence length, we could observe changes along the entire envelope gene, likely caused by differential selective pressure in the brain that may contribute to neurological disease.
Collapse
Affiliation(s)
- Robin L Brese
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Maria Paz Gonzalez-Perez
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Matthew Koch
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Olivia O'Connell
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Katherine Luzuriaga
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Mohan Somasundaran
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | - Paul R Clapham
- Program in Molecular Medicine, University of Massachusetts Medical School, Biotech 2, 373 Plantation Street, Worcester, MA, 01605, USA
| | | | - David J Nolan
- Bioinfoexperts, LLC, 718 Bayou Ln, Thibodaux, LA, 70301, USA
| | - Rebecca Rose
- Bioinfoexperts, LLC, 718 Bayou Ln, Thibodaux, LA, 70301, USA.
| | | |
Collapse
|
20
|
Wymant C, Blanquart F, Golubchik T, Gall A, Bakker M, Bezemer D, Croucher NJ, Hall M, Hillebregt M, Ong SH, Ratmann O, Albert J, Bannert N, Fellay J, Fransen K, Gourlay A, Grabowski MK, Gunsenheimer-Bartmeyer B, Günthard HF, Kivelä P, Kouyos R, Laeyendecker O, Liitsola K, Meyer L, Porter K, Ristola M, van Sighem A, Berkhout B, Cornelissen M, Kellam P, Reiss P, Fraser C. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver. Virus Evol 2018; 4:vey007. [PMID: 29876136 PMCID: PMC5961307 DOI: 10.1093/ve/vey007] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.
Collapse
Affiliation(s)
- Chris Wymant
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK.,Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | - François Blanquart
- Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | - Tanya Golubchik
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK.,Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Astrid Gall
- Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.,Virus Genomics, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Margreet Bakker
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands
| | | | - Nicholas J Croucher
- Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | - Matthew Hall
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK.,Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | | | - Swee Hoe Ong
- Virus Genomics, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Oliver Ratmann
- Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK.,Department of Mathematics, Imperial College London, London, UK
| | - Jan Albert
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.,Department of Clinical Microbiology, Karolinska University Hospital, Stockholm, Sweden
| | - Norbert Bannert
- Division for HIV and Other Retroviruses, Robert Koch Institute, Berlin, Germany
| | - Jacques Fellay
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Katrien Fransen
- HIV/STI Reference Laboratory, Department of Clinical Science, WHO Collaborating Centre, Institute of Tropical Medicine, Antwerpen, Belgium
| | - Annabelle Gourlay
- Institute for Global Health, University College London, London, UK.,Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK
| | - M Kate Grabowski
- Department of Pathology, John Hopkins University, Baltimore, MD, USA
| | | | - Huldrych F Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | - Pia Kivelä
- Department of Infectious Diseases, Helsinki University Hospital, Helsinki, Finland
| | - Roger Kouyos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich, Switzerland.,Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | | | - Kirsi Liitsola
- Department of Infectious Diseases, Helsinki University Hospital, Helsinki, Finland
| | - Laurence Meyer
- INSERM CESP U1018, Université Paris Sud, Université Paris Saclay, APHP, Service de Santé Publique, Hôpital de Bicêtre, Le Kremlin-Bicêtre, France
| | - Kholoud Porter
- Institute for Global Health, University College London, London, UK
| | - Matti Ristola
- Department of Infectious Diseases, Helsinki University Hospital, Helsinki, Finland
| | | | - Ben Berkhout
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands
| | - Marion Cornelissen
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands
| | - Paul Kellam
- Kymab Ltd, Cambridge, UK.,Division of Infectious Diseases, Department of Medicine, Imperial College London, London, UK
| | - Peter Reiss
- Stichting HIV Monitoring, Amsterdam, The Netherlands.,Department of Global Health, Academic Medical Center and Amsterdam Institute for Global Health and Development, Amsterdam, The Netherlands
| | - Christophe Fraser
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK.,Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
| | | |
Collapse
|
21
|
McNaughton AL, Sreenu VB, Wilkie G, Gunson R, Templeton K, Leitch ECM. Prevalence of mixed genotype hepatitis C virus infections in the UK as determined by genotype-specific PCR and deep sequencing. J Viral Hepat 2018; 25:524-534. [PMID: 29274184 PMCID: PMC5947153 DOI: 10.1111/jvh.12849] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 11/20/2017] [Indexed: 12/13/2022]
Abstract
The incidence of mixed genotype hepatitis C virus (HCV) infections in the UK is largely unknown. As the efficacy of direct-acting antivirals is variable across different genotypes, treatment regimens are tailored to the infecting genotype, which may pose issues for the treatment of underlying genotypes within undiagnosed mixed genotype HCV infections. There is therefore a need to accurately diagnose mixed genotype infections prior to treatment. PCR-based diagnostic tools were developed to screen for the occurrence of mixed genotype infections caused by the most common UK genotypes, 1a and 3, in a cohort of 506 individuals diagnosed with either of these genotypes. The overall prevalence rate of mixed infection was 3.8%; however, this rate was unevenly distributed, with 6.7% of individuals diagnosed with genotype 3 harbouring genotype 1a strains and only 0.8% of samples from genotype 1a patients harbouring genotype 3 (P < .05). Mixed infection samples consisted of a major and a minor genotype, with the latter constituting less than 21% of the total viral load and, in 67% of cases, less than 1% of the viral load. Analysis of a subset of the cohort by Illumina PCR next-generation sequencing resulted in a much greater incidence rate than obtained by PCR. This may have occurred due to the nonquantitative nature of the technique and despite the designation of false-positive thresholds based on negative controls.
Collapse
Affiliation(s)
- A. L. McNaughton
- MRC‐University of Glasgow Centre for Virus ResearchGlasgowUK,Present address:
Nuffield Department of MedicineUniversity of OxfordOxfordUK
| | - V. B. Sreenu
- MRC‐University of Glasgow Centre for Virus ResearchGlasgowUK
| | - G. Wilkie
- MRC‐University of Glasgow Centre for Virus ResearchGlasgowUK
| | - R. Gunson
- West of Scotland Specialist Virology CentreRoyal Infirmary of GlasgowGlasgowUK
| | | | - E. C. M. Leitch
- MRC‐University of Glasgow Centre for Virus ResearchGlasgowUK
| |
Collapse
|
22
|
Cartwright JF, Anderson K, Longworth J, Lobb P, James DC. Highly sensitive detection of mutations in CHO cell recombinant DNA using multi-parallel single molecule real-time DNA sequencing. Biotechnol Bioeng 2018; 115:1485-1498. [DOI: 10.1002/bit.26561] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2017] [Revised: 12/01/2017] [Accepted: 02/04/2018] [Indexed: 12/13/2022]
Affiliation(s)
- Joseph F. Cartwright
- Department of Chemical and Biological Engineering; University of Sheffield; Sheffield UK
| | - Karin Anderson
- Cell Line Development; BioTherapeutic Pharmaceutical Sciences; Pfizer Inc; Andover Massachusetts
| | - Joseph Longworth
- Department of Chemical and Biological Engineering; University of Sheffield; Sheffield UK
| | | | - David C. James
- Department of Chemical and Biological Engineering; University of Sheffield; Sheffield UK
| |
Collapse
|
23
|
Comparative genomics reveals new single-nucleotide polymorphisms that can assist in identification of adherent-invasive Escherichia coli. Sci Rep 2018; 8:2695. [PMID: 29426864 PMCID: PMC5807354 DOI: 10.1038/s41598-018-20843-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 01/24/2018] [Indexed: 01/19/2023] Open
Abstract
Adherent-invasive Escherichia coli (AIEC) have been involved in Crohn’s disease (CD). Currently, AIEC are identified by time-consuming techniques based on in vitro infection of cell lines to determine their ability to adhere to and invade intestinal epithelial cells as well as to survive and replicate within macrophages. Our aim was to find signature sequences that can be used to identify the AIEC pathotype. Comparative genomics was performed between three E. coli strain pairs, each pair comprised one AIEC and one non-AIEC with identical pulsotype, sequence type and virulence gene carriage. Genetic differences were further analysed in 22 AIEC and 28 non-AIEC isolated from CD patients and controls. The strain pairs showed similar genome structures, and no gene was specific to AIEC. Three single nucleotide polymorphisms displayed different nucleotide distributions between AIEC and non-AIEC, and four correlated with increased adhesion and/or invasion indices. Here, we present a classification algorithm based on the identification of three allelic variants that can predict the AIEC phenotype with 84% accuracy. Our study corroborates the absence of an AIEC-specific genetic marker distributed across all AIEC strains. Nonetheless, point mutations putatively involved in the AIEC phenotype can be used for the molecular identification of the AIEC pathotype.
Collapse
|
24
|
Myrmel M, Oma V, Khatri M, Hansen HH, Stokstad M, Berg M, Blomström AL. Single primer isothermal amplification (SPIA) combined with next generation sequencing provides complete bovine coronavirus genome coverage and higher sequence depth compared to sequence-independent single primer amplification (SISPA). PLoS One 2017; 12:e0187780. [PMID: 29112950 PMCID: PMC5675387 DOI: 10.1371/journal.pone.0187780] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 10/25/2017] [Indexed: 01/07/2023] Open
Abstract
Coronaviruses are of major importance for both animal and human health. With the emergence of novel coronaviruses such as SARS and MERS, the need for fast genome characterisation is ever so important. Further, in order to understand the influence of quasispecies of these viruses in relation to biology, techniques for deep-sequence and full-length viral genome analysis are needed. In the present study, we compared the efficiency of two sequence-independent approaches [sequence-independent single primer amplification (SISPA) and single primer isothermal amplification (SPIA, represented by the Ovation kit)] coupled with high-throughput sequencing to generate the full-length genome of bovine coronavirus (BCoV) from a nasal swab. Both methods achieved high genome coverage (100% for SPIA and 99% for SISPA), however, there was a clear difference in the percentage of reads that mapped to BCoV. While approximately 45% of the Ovation reads mapped to BCoV (sequence depth of 169-284 944), only 0.07% of the SISPA reads (sequence depth of 0-249) mapped to the reference genome. Although BCoV was the focus of the study we also identified a bovine rhinitis B virus (BRBV) in the data sets. The trend for this virus was similar to that observed for BCoV regarding Ovation vs. SISPA, but with fewer sequences mapping to BRBV due to a lower amount of this virus. In summary, the SPIA approach used in this study produced coverage of the entire BCoV (high copy number) and BRBV (low copy number) and a high sequence/genome depth compared to SISPA. Although this is a limited study, the results indicate that the Ovation method could be a preferred approach for full genome sequencing if a low copy number of viral RNA is expected and if high sequence depth is desired.
Collapse
Affiliation(s)
- Mette Myrmel
- Department for Food Safety and Infection Biology, Norwegian University of Life Sciences, Oslo, Norway
| | - Veslemøy Oma
- Department of Production Animal Clinical Sciences, Norwegian University of Life Sciences, Oslo, Norway
| | - Mamata Khatri
- Department for Food Safety and Infection Biology, Norwegian University of Life Sciences, Oslo, Norway
| | | | - Maria Stokstad
- Department of Production Animal Clinical Sciences, Norwegian University of Life Sciences, Oslo, Norway
| | - Mikael Berg
- Department of Biomedical Sciences and Veterinary Public Health, Section of Virology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Anne-Lie Blomström
- Department of Biomedical Sciences and Veterinary Public Health, Section of Virology, Swedish University of Agricultural Sciences, Uppsala, Sweden
| |
Collapse
|
25
|
Bakkeren E, Dolowschiak T, R J Diard M. Detection of Mutations Affecting Heterogeneously Expressed Phenotypes by Colony Immunoblot and Dedicated Semi-Automated Image Analysis Pipeline. Front Microbiol 2017; 8:2044. [PMID: 29104568 PMCID: PMC5655795 DOI: 10.3389/fmicb.2017.02044] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Accepted: 10/06/2017] [Indexed: 11/28/2022] Open
Abstract
To understand how bacteria evolve and adapt to their environment, it can be relevant to monitor phenotypic changes that occur in a population. Single cell level analyses and sorting of mutant cells according to a particular phenotypic readout can constitute efficient strategies. However, when the phenotype of interest is expressed heterogeneously in ancestral isogenic populations of cells, single cell level sorting approaches are not optimal. Phenotypic heterogeneity can for instance make no-expression mutant cells indistinguishable from a subpopulation of wild-type cells transiently not expressing the phenotype. The analysis of clonal populations (e.g., isolated colonies), in which the average phenotype is measured, can circumvent this issue. Indeed, no-expression mutants form negative populations while wild-type clones form populations in which average expression of the phenotype yields a positive signal. We present here an optimized colony immunoblot protocol and a semi-automated image analysis pipeline (ImageJ macro) allowing for rapid detection of clones harboring mutations that affect the heterogeneous (i.e., bimodal) expression of the Type Three Secretion System-1 (TTSS-1) in Salmonella enterica serovar Typhimurium. We show that this protocol can efficiently differentiate clones expressing TTSS-1 at various levels in mixed populations. We were able to detect the emergence of hilC mutants in which the proportion of cells expressing TTSS-1 was reduced compared to the ancestor. We could also follow changes in the frequency of different mutants during long-term infections. This demonstrates that our protocol constitutes a tractable approach to assess semi-quantitatively the evolutionary dynamics of heterogeneous phenotypes, such as the expression of virulence genes, in bacterial populations.
Collapse
Affiliation(s)
- Erik Bakkeren
- Department of Biology, Institute of Microbiology, ETH Zürich, Zürich, Switzerland
| | - Tamas Dolowschiak
- Department of Biology, Institute of Microbiology, ETH Zürich, Zürich, Switzerland.,Institute of Experimental Immunology, University of Zürich, Zürich, Switzerland
| | - Médéric R J Diard
- Department of Biology, Institute of Microbiology, ETH Zürich, Zürich, Switzerland
| |
Collapse
|
26
|
Malhotra R, Jha M, Poss M, Acharya R. A random forest classifier for detecting rare variants in NGS data from viral populations. Comput Struct Biotechnol J 2017; 15:388-395. [PMID: 28819548 PMCID: PMC5548337 DOI: 10.1016/j.csbj.2017.07.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 07/01/2017] [Accepted: 07/03/2017] [Indexed: 11/28/2022] Open
Abstract
We propose a random forest classifier for detecting rare variants from sequencing errors in Next Generation Sequencing (NGS) data from viral populations. The method utilizes counts of varying length of k-mers from the reads of a viral population to train a Random forest classifier, called MultiRes, that classifies k-mers as erroneous or rare variants. Our algorithm is rooted in concepts from signal processing and uses a frame-based representation of k-mers. Frames are sets of non-orthogonal basis functions that were traditionally used in signal processing for noise removal. We define discrete spatial signals for genomes and sequenced reads, and show that k-mers of a given size constitute a frame. We evaluate MultiRes on simulated and real viral population datasets, which consist of many low frequency variants, and compare it to the error detection methods used in correction tools known in the literature. MultiRes has 4 to 500 times less false positives k-mer predictions compared to other methods, essential for accurate estimation of viral population diversity and their de-novo assembly. It has high recall of the true k-mers, comparable to other error correction methods. MultiRes also has greater than 95% recall for detecting single nucleotide polymorphisms (SNPs) and fewer false positive SNPs, while detecting higher number of rare variants compared to other variant calling methods for viral populations. The software is available freely from the GitHub link https://github.com/raunaq-m/MultiRes.
Collapse
Affiliation(s)
- Raunaq Malhotra
- The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Manjari Jha
- The School of Electrical Engineering and Computer Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Mary Poss
- Department of Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Raj Acharya
- School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
27
|
Kinoti WM, Constable FE, Nancarrow N, Plummer KM, Rodoni B. Analysis of intra-host genetic diversity of Prunus necrotic ringspot virus (PNRSV) using amplicon next generation sequencing. PLoS One 2017; 12:e0179284. [PMID: 28632759 PMCID: PMC5478126 DOI: 10.1371/journal.pone.0179284] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 05/08/2017] [Indexed: 12/28/2022] Open
Abstract
PCR amplicon next generation sequencing (NGS) analysis offers a broadly applicable and targeted approach to detect populations of both high- or low-frequency virus variants in one or more plant samples. In this study, amplicon NGS was used to explore the diversity of the tripartite genome virus, Prunus necrotic ringspot virus (PNRSV) from 53 PNRSV-infected trees using amplicons from conserved gene regions of each of PNRSV RNA1, RNA2 and RNA3. Sequencing of the amplicons from 53 PNRSV-infected trees revealed differing levels of polymorphism across the three different components of the PNRSV genome with a total number of 5040, 2083 and 5486 sequence variants observed for RNA1, RNA2 and RNA3 respectively. The RNA2 had the lowest diversity of sequences compared to RNA1 and RNA3, reflecting the lack of flexibility tolerated by the replicase gene that is encoded by this RNA component. Distinct PNRSV phylo-groups, consisting of closely related clusters of sequence variants, were observed in each of PNRSV RNA1, RNA2 and RNA3. Most plant samples had a single phylo-group for each RNA component. Haplotype network analysis showed that smaller clusters of PNRSV sequence variants were genetically connected to the largest sequence variant cluster within a phylo-group of each RNA component. Some plant samples had sequence variants occurring in multiple PNRSV phylo-groups in at least one of each RNA and these phylo-groups formed distinct clades that represent PNRSV genetic strains. Variants within the same phylo-group of each Prunus plant sample had ≥97% similarity and phylo-groups within a Prunus plant sample and between samples had less ≤97% similarity. Based on the analysis of diversity, a definition of a PNRSV genetic strain was proposed. The proposed definition was applied to determine the number of PNRSV genetic strains in each of the plant samples and the complexity in defining genetic strains in multipartite genome viruses was explored.
Collapse
Affiliation(s)
- Wycliff M. Kinoti
- Agriculture Victoria, AgriBio, La Trobe University, Melbourne, VIC, Australia
- School of Applied Systems Biology, AgriBio, La Trobe University, Melbourne, VIC, Australia
| | - Fiona E. Constable
- Agriculture Victoria, AgriBio, La Trobe University, Melbourne, VIC, Australia
| | - Narelle Nancarrow
- Agriculture Victoria, AgriBio, La Trobe University, Melbourne, VIC, Australia
| | - Kim M. Plummer
- Department of Animal, Plant and Soil Sciences, AgriBio, La Trobe University, Melbourne, VIC, Australia
| | - Brendan Rodoni
- Agriculture Victoria, AgriBio, La Trobe University, Melbourne, VIC, Australia
- School of Applied Systems Biology, AgriBio, La Trobe University, Melbourne, VIC, Australia
| |
Collapse
|
28
|
Abstract
The human immunodeficiency virus (HIV) evolves rapidly owing to the combined activity of error-prone reverse transcriptase, recombination, and short generation times, leading to extensive viral diversity both within and between hosts. This diversity is a major contributing factor in the failure of the immune system to eradicate the virus and has important implications for the development of suitable drugs and vaccines to combat infection. This review will discuss the recent technological advances that have shed light on HIV evolution and will summarise emerging concepts in this field.
Collapse
Affiliation(s)
- Sophie M Andrews
- Nuffield Department of Clinical Medicine, University of Oxford, NDMRB, Oxford, UK
| | - Sarah Rowland-Jones
- Nuffield Department of Clinical Medicine, University of Oxford, NDMRB, Oxford, UK
| |
Collapse
|
29
|
MetaGaAP: A Novel Pipeline to Estimate Community Composition and Abundance from Non-Model Sequence Data. BIOLOGY 2017; 6:biology6010014. [PMID: 28218638 PMCID: PMC5372007 DOI: 10.3390/biology6010014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Revised: 01/06/2017] [Accepted: 02/07/2017] [Indexed: 02/01/2023]
Abstract
Next generation sequencing and bioinformatic approaches are increasingly used to quantify microorganisms within populations by analysis of ‘meta-barcode’ data. This approach relies on comparison of amplicon sequences of ‘barcode’ regions from a population with public-domain databases of reference sequences. However, for many organisms relevant ‘barcode’ regions may not have been identified and large databases of reference sequences may not be available. A workflow and software pipeline, ‘MetaGaAP,’ was developed to identify and quantify genotypes through four steps: shotgun sequencing and identification of polymorphisms in a metapopulation to identify custom ‘barcode’ regions of less than 30 polymorphisms within the span of a single ‘read’, amplification and sequencing of the ‘barcode’, generation of a custom database of polymorphisms, and quantitation of the relative abundance of genotypes. The pipeline and workflow were validated in a ‘wild type’ Alphabaculovirus isolate, Helicoverpa armigera single nucleopolyhedrovirus (HaSNPV-AC53) and a tissue-culture derived strain (HaSNPV-AC53-T2). The approach was validated by comparison of polymorphisms in amplicons and shotgun data, and by comparison of predicted dominant and co-dominant genotypes with Sanger sequences. The computational power required to generate and search the database effectively limits the number of polymorphisms that can be included in a barcode to 30 or less. The approach can be used in quantitative analysis of the ecology and pathology of non-model organisms.
Collapse
|
30
|
Zojer M, Schuster LN, Schulz F, Pfundner A, Horn M, Rattei T. Variant profiling of evolving prokaryotic populations. PeerJ 2017; 5:e2997. [PMID: 28224054 PMCID: PMC5316281 DOI: 10.7717/peerj.2997] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 01/17/2017] [Indexed: 12/30/2022] Open
Abstract
Genomic heterogeneity of bacterial species is observed and studied in experimental evolution experiments and clinical diagnostics, and occurs as micro-diversity of natural habitats. The challenge for genome research is to accurately capture this heterogeneity with the currently used short sequencing reads. Recent advances in NGS technologies improved the speed and coverage and thus allowed for deep sequencing of bacterial populations. This facilitates the quantitative assessment of genomic heterogeneity, including low frequency alleles or haplotypes. However, false positive variant predictions due to sequencing errors and mapping artifacts of short reads need to be prevented. We therefore created VarCap, a workflow for the reliable prediction of different types of variants even at low frequencies. In order to predict SNPs, InDels and structural variations, we evaluated the sensitivity and accuracy of different software tools using synthetic read data. The results suggested that the best sensitivity could be reached by a union of different tools, however at the price of increased false positives. We identified possible reasons for false predictions and used this knowledge to improve the accuracy by post-filtering the predicted variants according to properties such as frequency, coverage, genomic environment/localization and co-localization with other variants. We observed that best precision was achieved by using an intersection of at least two tools per variant. This resulted in the reliable prediction of variants above a minimum relative abundance of 2%. VarCap is designed for being routinely used within experimental evolution experiments or for clinical diagnostics. The detected variants are reported as frequencies within a VCF file and as a graphical overview of the distribution of the different variant/allele/haplotype frequencies. The source code of VarCap is available at https://github.com/ma2o/VarCap. In order to provide this workflow to a broad community, we implemeted VarCap on a Galaxy webserver, which is accessible at http://galaxy.csb.univie.ac.at.
Collapse
Affiliation(s)
- Markus Zojer
- Department of Microbiology and Ecosystems Science, Division of Computational Systems Biology, University of Vienna , Vienna , Austria
| | - Lisa N Schuster
- Department of Microbiology and Ecosystems Science, Division of Microbial Ecology, University of Vienna , Vienna , Austria
| | - Frederik Schulz
- DOE Joint Genome Institute, Lawrence Berkeley National Lab , Walnut Creek , CA , United States
| | - Alexander Pfundner
- Department of Microbiology and Ecosystems Science, Division of Computational Systems Biology, University of Vienna , Vienna , Austria
| | - Matthias Horn
- Department of Microbiology and Ecosystems Science, Division of Microbial Ecology, University of Vienna , Vienna , Austria
| | - Thomas Rattei
- Department of Microbiology and Ecosystems Science, Division of Computational Systems Biology, University of Vienna , Vienna , Austria
| |
Collapse
|
31
|
Brumme CJ, Poon AFY. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping. Virus Res 2016; 239:97-105. [PMID: 27993623 DOI: 10.1016/j.virusres.2016.12.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 12/15/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022]
Abstract
Genetic sequencing ("genotyping") plays a critical role in the modern clinical management of HIV infection. This virus evolves rapidly within patients because of its error-prone reverse transcriptase and short generation time. Consequently, HIV variants with mutations that confer resistance to one or more antiretroviral drugs can emerge during sub-optimal treatment. There are now multiple HIV drug resistance interpretation algorithms that take the region of the HIV genome encoding the major drug targets as inputs; expert use of these algorithms can significantly improve to clinical outcomes in HIV treatment. Next-generation sequencing has the potential to revolutionize HIV resistance genotyping by lowering the threshold that rare but clinically significant HIV variants can be detected reproducibly, and by conferring improved cost-effectiveness in high-throughput scenarios. In this review, we discuss the relative merits and challenges of deploying the Illumina MiSeq instrument for clinical HIV genotyping.
Collapse
Affiliation(s)
- Chanson J Brumme
- BC Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
| | - Art F Y Poon
- Department of Pathology & Laboratory Medicine, Western University, London, Ontario, Canada.
| |
Collapse
|
32
|
Deep Sequencing of Influenza A Virus from a Human Challenge Study Reveals a Selective Bottleneck and Only Limited Intrahost Genetic Diversification. J Virol 2016; 90:11247-11258. [PMID: 27707932 PMCID: PMC5126380 DOI: 10.1128/jvi.01657-16] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 09/29/2016] [Indexed: 01/06/2023] Open
Abstract
Knowledge of influenza virus evolution at the point of transmission and at the intrahost level remains limited, particularly for human hosts. Here, we analyze a unique viral data set of next-generation sequencing (NGS) samples generated from a human influenza challenge study wherein 17 healthy subjects were inoculated with cell- and egg-passaged virus. Nasal wash samples collected from 7 of these subjects were successfully deep sequenced. From these, we characterized changes in the subjects' viral populations during infection and identified differences between the virus in these samples and the viral stock used to inoculate the subjects. We first calculated pairwise genetic distances between the subjects' nasal wash samples, the viral stock, and the influenza virus A/Wisconsin/67/2005 (H3N2) reference strain used to generate the stock virus. These distances revealed that considerable viral evolution occurred at various points in the human challenge study. Further quantitative analyses indicated that (i) the viral stock contained genetic variants that originated and likely were selected for during the passaging process, (ii) direct intranasal inoculation with the viral stock resulted in a selective bottleneck that reduced nonsynonymous genetic diversity in the viral hemagglutinin and nucleoprotein, and (iii) intrahost viral evolution continued over the course of infection. These intrahost evolutionary dynamics were dominated by purifying selection. Our findings indicate that rapid viral evolution can occur during acute influenza infection in otherwise healthy human hosts when the founding population size of the virus is large, as is the case with direct intranasal inoculation. IMPORTANCE Influenza viruses circulating among humans are known to rapidly evolve over time. However, little is known about how influenza virus evolves across single transmission events and over the course of a single infection. To address these issues, we analyze influenza virus sequences from a human challenge experiment that initiated infection with a cell- and egg-passaged viral stock, which appeared to have adapted during its preparation. We find that the subjects' viral populations differ genetically from the viral stock, with subjects' viral populations having lower representation of the amino-acid-changing variants that arose during viral preparation. We also find that most of the viral evolution occurring over single infections is characterized by further decreases in the frequencies of these amino-acid-changing variants and that only limited intrahost genetic diversification through new mutations is apparent. Our findings indicate that influenza virus populations can undergo rapid genetic changes during acute human infections.
Collapse
|
33
|
Leung P, Eltahla AA, Lloyd AR, Bull RA, Luciani F. Understanding the complex evolution of rapidly mutating viruses with deep sequencing: Beyond the analysis of viral diversity. Virus Res 2016; 239:43-54. [PMID: 27888126 DOI: 10.1016/j.virusres.2016.10.014] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Revised: 10/24/2016] [Accepted: 10/25/2016] [Indexed: 12/24/2022]
Abstract
With the advent of affordable deep sequencing technologies, detection of low frequency variants within genetically diverse viral populations can now be achieved with unprecedented depth and efficiency. The high-resolution data provided by next generation sequencing technologies is currently recognised as the gold standard in estimation of viral diversity. In the analysis of rapidly mutating viruses, longitudinal deep sequencing datasets from viral genomes during individual infection episodes, as well as at the epidemiological level during outbreaks, now allow for more sophisticated analyses such as statistical estimates of the impact of complex mutation patterns on the evolution of the viral populations both within and between hosts. These analyses are revealing more accurate descriptions of the evolutionary dynamics that underpin the rapid adaptation of these viruses to the host response, and to drug therapies. This review assesses recent developments in methods and provide informative research examples using deep sequencing data generated from rapidly mutating viruses infecting humans, particularly hepatitis C virus (HCV), human immunodeficiency virus (HIV), Ebola virus and influenza virus, to understand the evolution of viral genomes and to explore the relationship between viral mutations and the host adaptive immune response. Finally, we discuss limitations in current technologies, and future directions that take advantage of publically available large deep sequencing datasets.
Collapse
Affiliation(s)
- Preston Leung
- School of Medical Sciences, Faculty of Medicine, UNSW Australia, Sydney, NSW 2052, Australia; The Kirby Institute, UNSW Australia, Sydney, NSW 2052, Australia
| | - Auda A Eltahla
- School of Medical Sciences, Faculty of Medicine, UNSW Australia, Sydney, NSW 2052, Australia; The Kirby Institute, UNSW Australia, Sydney, NSW 2052, Australia
| | - Andrew R Lloyd
- The Kirby Institute, UNSW Australia, Sydney, NSW 2052, Australia
| | - Rowena A Bull
- School of Medical Sciences, Faculty of Medicine, UNSW Australia, Sydney, NSW 2052, Australia; The Kirby Institute, UNSW Australia, Sydney, NSW 2052, Australia
| | - Fabio Luciani
- School of Medical Sciences, Faculty of Medicine, UNSW Australia, Sydney, NSW 2052, Australia; The Kirby Institute, UNSW Australia, Sydney, NSW 2052, Australia.
| |
Collapse
|
34
|
Posada-Cespedes S, Seifert D, Beerenwinkel N. Recent advances in inferring viral diversity from high-throughput sequencing data. Virus Res 2016; 239:17-32. [PMID: 27693290 DOI: 10.1016/j.virusres.2016.09.016] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Revised: 09/23/2016] [Accepted: 09/24/2016] [Indexed: 02/05/2023]
Abstract
Rapidly evolving RNA viruses prevail within a host as a collection of closely related variants, referred to as viral quasispecies. Advances in high-throughput sequencing (HTS) technologies have facilitated the assessment of the genetic diversity of such virus populations at an unprecedented level of detail. However, analysis of HTS data from virus populations is challenging due to short, error-prone reads. In order to account for uncertainties originating from these limitations, several computational and statistical methods have been developed for studying the genetic heterogeneity of virus population. Here, we review methods for the analysis of HTS reads, including approaches to local diversity estimation and global haplotype reconstruction. Challenges posed by aligning reads, as well as the impact of reference biases on diversity estimates are also discussed. In addition, we address some of the experimental approaches designed to improve the biological signal-to-noise ratio. In the future, computational methods for the analysis of heterogeneous virus populations are likely to continue being complemented by technological developments.
Collapse
Affiliation(s)
- Susana Posada-Cespedes
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland
| | - David Seifert
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland; SIB, Basel, Switzerland.
| |
Collapse
|
35
|
Trappe K, Marschall T, Renard BY. Detecting horizontal gene transfer by mapping sequencing reads across species boundaries. Bioinformatics 2016; 32:i595-i604. [DOI: 10.1093/bioinformatics/btw423] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|
36
|
Pearson SK, Bradford TM, Ansari TH, Bull CM, Gardner MG. MHC genotyping from next-generation sequencing: detailed methodology for the gidgee skink, Egernia stokesii. T ROY SOC SOUTH AUST 2016. [DOI: 10.1080/03721426.2016.1216735] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- S. K. Pearson
- School of Biological Sciences, Flinders University of South Australia, Bedford Park, Australia
| | - T. M. Bradford
- School of Biological Sciences, Flinders University of South Australia, Bedford Park, Australia
| | - T. H. Ansari
- School of Biological Sciences, Flinders University of South Australia, Bedford Park, Australia
| | - C. M. Bull
- School of Biological Sciences, Flinders University of South Australia, Bedford Park, Australia
| | - M. G. Gardner
- School of Biological Sciences, Flinders University of South Australia, Bedford Park, Australia
- Evolutionary Biology Unit, South Australian Museum, Adelaide, Australia
| |
Collapse
|
37
|
Vectors as Epidemiological Sentinels: Patterns of Within-Tick Borrelia burgdorferi Diversity. PLoS Pathog 2016; 12:e1005759. [PMID: 27414806 PMCID: PMC4944968 DOI: 10.1371/journal.ppat.1005759] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 06/18/2016] [Indexed: 01/13/2023] Open
Abstract
Hosts including humans, other vertebrates, and arthropods, are frequently infected with heterogeneous populations of pathogens. Within-host pathogen diversity has major implications for human health, epidemiology, and pathogen evolution. However, pathogen diversity within-hosts is difficult to characterize and little is known about the levels and sources of within-host diversity maintained in natural populations of disease vectors. Here, we examine genomic variation of the Lyme disease bacteria, Borrelia burgdorferi (Bb), in 98 individual field-collected tick vectors as a model for study of within-host processes. Deep population sequencing reveals extensive and previously undocumented levels of Bb variation: the majority (~70%) of ticks harbor mixed strain infections, which we define as levels Bb diversity pre-existing in a diverse inoculum. Within-tick diversity is thus a sample of the variation present within vertebrate hosts. Within individual ticks, we detect signatures of positive selection. Genes most commonly under positive selection across ticks include those involved in dissemination in vertebrate hosts and evasion of the vertebrate immune complement. By focusing on tick-borne Bb, we show that vectors can serve as epidemiological and evolutionary sentinels: within-vector pathogen diversity can be a useful and unbiased way to survey circulating pathogen diversity and identify evolutionary processes occurring in natural transmission cycles. Lyme disease, caused by a bacteria carried by deer ticks, is the most common vector-borne disease in North America and over 30,000 cases are reported each year in the United States. Ticks may be infected with multiple strains of the Lyme disease bacteria, which differ in transmissibility and the harm they pose to humans. In this study, we collected 98 infected deer ticks from across the United States and southern Canada. We used genetic techniques to investigate the diversity of the Lyme disease bacteria infecting each individual tick. We find that 70% of ticks are infected with multiple strains of the Lyme disease bacteria, indicating that humans may be exposed to and infected with multiple bacterial strains from a single tick bite. We also find evidence that the Lyme disease bacteria is evolving in response to the immune defenses of its natural hosts (including rodents and birds). Our study shows that individual ticks and other disease vectors can be studied as epidemiological sentinels, which reveal the extensive diversity of pathogens circulating in natural disease cycles and how they are evolving.
Collapse
|
38
|
Strandh M, Råberg L. Within-host competition between Borrelia afzelii ospC strains in wild hosts as revealed by massively parallel amplicon sequencing. Philos Trans R Soc Lond B Biol Sci 2016; 370:rstb.2014.0293. [PMID: 26150659 DOI: 10.1098/rstb.2014.0293] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Infections frequently consist of more than one strain of a given pathogen. Experiments have shown that co-infecting strains often compete, so that the infection intensity of each strain in mixed infections is lower than in single strain infections. Such within-host competition can have important epidemiological and evolutionary consequences. However, the extent of competition has rarely been investigated in wild, naturally infected hosts, where there is noise in the form of varying inoculation doses, asynchronous infections and host heterogeneity, which can potentially alleviate or eliminate competition. Here, we investigated the extent of competition between Borrelia afzelii strains (as determined by ospC genotype) in three host species sampled in the wild. For this purpose, we developed a protocol for 454 amplicon sequencing of ospC, which allows both detection and quantification of each individual strain in an infection. Each host individual was infected with one to six ospC strains. The infection intensity of each strain was lower in mixed infections than in single ones, showing that there was competition. Rank-abundance plots revealed that there was typically one dominant strain, but that the evenness of the relative infection intensity of the different strains in an infection increased with the multiplicity of infection. We conclude that within-host competition can play an important role under natural conditions despite many potential sources of noise, and that quantification by next-generation amplicon sequencing offers new possibilities to dissect within-host interactions in naturally infected hosts.
Collapse
Affiliation(s)
- Maria Strandh
- Molecular Ecology and Evolution Lab, Department of Biology, Lund University, Sölvegatan 37, Lund 223 62, Sweden
| | - Lars Råberg
- Functional Zoology, Department of Biology, Lund University, Sölvegatan 35, Lund 223 62, Sweden
| |
Collapse
|
39
|
Trémeaux P, Caporossi A, Thélu MA, Blum M, Leroy V, Morand P, Larrat S. Hepatitis C virus whole genome sequencing: Current methods/issues and future challenges. Crit Rev Clin Lab Sci 2016; 53:341-51. [PMID: 27068766 DOI: 10.3109/10408363.2016.1163663] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Therapy for hepatitis C is currently undergoing a revolution. The arrival of new antiviral agents targeting viral proteins reinforces the need for a better knowledge of the viral strains infecting each patient. Hepatitis C virus (HCV) whole genome sequencing provides essential information for precise typing, study of the viral natural history or identification of resistance-associated variants. First performed with Sanger sequencing, the arrival of next-generation sequencing (NGS) has simplified the technical process and provided more detailed data on the nature and evolution of viral quasi-species. We will review the different techniques used for HCV complete genome sequencing and their applications, both before and after the apparition of NGS. The progress brought by new and future technologies will also be discussed, as well as the remaining difficulties, largely due to the genomic variability.
Collapse
Affiliation(s)
- Pauline Trémeaux
- a Laboratoire de Virologie , Institut de Biologie et Pathologie, CHU Grenoble-Alpes , Grenoble , France .,b Institut de Biologie Structurale (IBS), UMR 5075 CEA-CNRS-UGA , Grenoble , France
| | - Alban Caporossi
- c Centre d'investigation clinique, Santé publique, CHU Grenoble-Alpes , Grenoble , France .,d Laboratoire TIMC-IMAG , Université de Grenoble Alpes , Grenoble , France , and
| | - Marie-Ange Thélu
- e Clinique d'Hépato-gastroentérologie, Pôle Digidune, CHU Grenoble-Alpes , Grenoble , France
| | - Michael Blum
- d Laboratoire TIMC-IMAG , Université de Grenoble Alpes , Grenoble , France , and
| | - Vincent Leroy
- e Clinique d'Hépato-gastroentérologie, Pôle Digidune, CHU Grenoble-Alpes , Grenoble , France
| | - Patrice Morand
- a Laboratoire de Virologie , Institut de Biologie et Pathologie, CHU Grenoble-Alpes , Grenoble , France .,b Institut de Biologie Structurale (IBS), UMR 5075 CEA-CNRS-UGA , Grenoble , France
| | - Sylvie Larrat
- a Laboratoire de Virologie , Institut de Biologie et Pathologie, CHU Grenoble-Alpes , Grenoble , France .,b Institut de Biologie Structurale (IBS), UMR 5075 CEA-CNRS-UGA , Grenoble , France
| |
Collapse
|
40
|
Liang M, Raley C, Zheng X, Kutty G, Gogineni E, Sherman BT, Sun Q, Chen X, Skelly T, Jones K, Stephens R, Zhou B, Lau W, Johnson C, Imamichi T, Jiang M, Dewar R, Lempicki RA, Tran B, Kovacs JA, Huang DW. Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads. BioData Min 2016; 9:13. [PMID: 27051465 PMCID: PMC4820869 DOI: 10.1186/s13040-016-0090-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2015] [Accepted: 03/22/2016] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Gene isoforms are commonly found in both prokaryotes and eukaryotes. Since each isoform may perform a specific function in response to changing environmental conditions, studying the dynamics of gene isoforms is important in understanding biological processes and disease conditions. However, genome-wide identification of gene isoforms is technically challenging due to the high degree of sequence identity among isoforms. Traditional targeted sequencing approach, involving Sanger sequencing of plasmid-cloned PCR products, has low throughput and is very tedious and time-consuming. Next-generation sequencing technologies such as Illumina and 454 achieve high throughput but their short read lengths are a critical barrier to accurate assembly of highly similar gene isoforms, and may result in ambiguities and false joining during sequence assembly. More recently, the third generation sequencer represented by the PacBio platform offers sufficient throughput and long reads covering the full length of typical genes, thus providing a potential to reliably profile gene isoforms. However, the PacBio long reads are error-prone and cannot be effectively analyzed by traditional assembly programs. RESULTS We present a clustering-based analysis pipeline integrated with PacBio sequencing data for profiling highly similar gene isoforms. This approach was first evaluated in comparison to de novo assembly of 454 reads using a benchmark admixture containing 10 known, cloned msg genes encoding the major surface glycoprotein of Pneumocystis jirovecii. All 10 msg isoforms were successfully reconstructed with the expected length (~1.5 kb) and correct sequence by the new approach, while 454 reads could not be correctly assembled using various assembly programs. When using an additional benchmark admixture containing 22 known P. jirovecii msg isoforms, this approach accurately reconstructed all but 4 these isoforms in their full-length (~3 kb); these 4 isoforms were present in low concentrations in the admixture. Finally, when applied to the original clinical sample from which the 22 known msg isoforms were cloned, this approach successfully identified not only all known isoforms accurately (~3 kb each) but also 48 novel isoforms. CONCLUSIONS PacBio sequencing integrated with the clustering-based analysis pipeline achieves high-throughput and high-resolution discrimination of highly similar sequences, and can serve as a new approach for genome-wide characterization of gene isoforms and other highly repetitive sequences.
Collapse
Affiliation(s)
- Ma Liang
- />Critical Care Medicine Department, Clinical Center, Frederick, MD USA
| | - Castle Raley
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Xin Zheng
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Geetha Kutty
- />Critical Care Medicine Department, Clinical Center, Frederick, MD USA
| | - Emile Gogineni
- />Critical Care Medicine Department, Clinical Center, Frederick, MD USA
| | - Brad T. Sherman
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Qiang Sun
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Xiongfong Chen
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Thomas Skelly
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Kristine Jones
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Robert Stephens
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Bin Zhou
- />Center of Information Technology, National Institutes of Health (NIH), Bethesda, MD USA
| | - William Lau
- />Center of Information Technology, National Institutes of Health (NIH), Bethesda, MD USA
| | - Calvin Johnson
- />Center of Information Technology, National Institutes of Health (NIH), Bethesda, MD USA
| | - Tomozumi Imamichi
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Minkang Jiang
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Robin Dewar
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Richard A. Lempicki
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Bao Tran
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
| | - Joseph A. Kovacs
- />Critical Care Medicine Department, Clinical Center, Frederick, MD USA
| | - Da Wei Huang
- />Leidos BioMedical Research, Inc., Frederick National Laboratory for Cancer Research, NIH, Frederick, MD USA
- />Current Affiliation: National Cancer Institute, NIH, Bethesda, MD USA
| |
Collapse
|
41
|
A Method for Amplicon Deep Sequencing of Drug Resistance Genes in Plasmodium falciparum Clinical Isolates from India. J Clin Microbiol 2016; 54:1500-1511. [PMID: 27008882 PMCID: PMC4879288 DOI: 10.1128/jcm.00235-16] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 03/20/2016] [Indexed: 11/20/2022] Open
Abstract
A major challenge to global malaria control and elimination is early detection and containment of emerging drug resistance. Next-generation sequencing (NGS) methods provide the resolution, scalability, and sensitivity required for high-throughput surveillance of molecular markers of drug resistance. We have developed an amplicon sequencing method on the Ion Torrent PGM platform for targeted resequencing of a panel of six Plasmodium falciparum genes implicated in resistance to first-line antimalarial therapy, including artemisinin combination therapy, chloroquine, and sulfadoxine-pyrimethamine. The protocol was optimized using 12 geographically diverse P. falciparum reference strains and successfully applied to multiplexed sequencing of 16 clinical isolates from India. The sequencing results from the reference strains showed 100% concordance with previously reported drug resistance-associated mutations. Single-nucleotide polymorphisms (SNPs) in clinical isolates revealed a number of known resistance-associated mutations and other nonsynonymous mutations that have not been implicated in drug resistance. SNP positions containing multiple allelic variants were used to identify three clinical samples containing mixed genotypes indicative of multiclonal infections. The amplicon sequencing protocol has been designed for the benchtop Ion Torrent PGM platform and can be operated with minimal bioinformatics infrastructure, making it ideal for use in countries that are endemic for the disease to facilitate routine large-scale surveillance of the emergence of drug resistance and to ensure continued success of the malaria treatment policy.
Collapse
|
42
|
Vincent AT, Derome N, Boyle B, Culley AI, Charette SJ. Next-generation sequencing (NGS) in the microbiological world: How to make the most of your money. J Microbiol Methods 2016; 138:60-71. [PMID: 26995332 DOI: 10.1016/j.mimet.2016.02.016] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Revised: 01/26/2016] [Accepted: 02/24/2016] [Indexed: 12/16/2022]
Abstract
The Sanger sequencing method produces relatively long DNA sequences of unmatched quality and has been considered for long time as the gold standard for sequencing DNA. Many improvements of the Sanger method that culminated with fluorescent dyes coupled with automated capillary electrophoresis enabled the sequencing of the first genomes. Nevertheless, using this technology to sequence whole genomes was costly, laborious and time consuming even for genomes that are relatively small in size. A major technological advance was the introduction of next-generation sequencing (NGS) pioneered by 454 Life Sciences in the early part of the 21th century. NGS allowed scientists to sequence thousands to millions of DNA molecules in a single machine run. Since then, new NGS technologies have emerged and existing NGS platforms have been improved, enabling the production of genome sequences at an unprecedented rate as well as broadening the spectrum of NGS applications. The current affordability of generating genomic information, especially with microbial samples, has resulted in a false sense of simplicity that belies the fact that many researchers still consider these technologies a black box. In this review, our objective is to identify and discuss four steps that we consider crucial to the success of any NGS-related project. These steps are: (1) the definition of the research objectives beyond sequencing and appropriate experimental planning, (2) library preparation, (3) sequencing and (4) data analysis. The goal of this review is to give an overview of the process, from sample to analysis, and discuss how to optimize your resources to achieve the most from your NGS-based research. Regardless of the evolution and improvement of the sequencing technologies, these four steps will remain relevant.
Collapse
Affiliation(s)
- Antony T Vincent
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Quebec City, QC G1V 0A6, Canada; Centre de recherche de l'Institut universitaire de cardiologie et de pneumologie de Québec, Quebec City, QC G1V 4G5, Canada
| | - Nicolas Derome
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biologie, Faculté des sciences et de génie, Université Laval, Quebec City G1V 0A6, Canada
| | - Brian Boyle
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| | - Alexander I Culley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Quebec City, QC G1V 0A6, Canada; Groupe de Recherche en Écologie Buccale (GREB), Faculté de médecine dentaire, Université Laval, Quebec City, QC G1V 0A6, Canada
| | - Steve J Charette
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada; Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de génie, Université Laval, Quebec City, QC G1V 0A6, Canada; Centre de recherche de l'Institut universitaire de cardiologie et de pneumologie de Québec, Quebec City, QC G1V 4G5, Canada.
| |
Collapse
|
43
|
Huang DW, Raley C, Jiang MK, Zheng X, Liang D, Rehman MT, Highbarger HC, Jiao X, Sherman B, Ma L, Chen X, Skelly T, Troyer J, Stephens R, Imamichi T, Pau A, Lempicki RA, Tran B, Nissley D, Lane HC, Dewar RL. Towards Better Precision Medicine: PacBio Single-Molecule Long Reads Resolve the Interpretation of HIV Drug Resistant Mutation Profiles at Explicit Quasispecies (Haplotype) Level. JOURNAL OF DATA MINING IN GENOMICS & PROTEOMICS 2016; 7:182. [PMID: 26949565 PMCID: PMC4775093 DOI: 10.4172/2153-0602.1000182] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Development of HIV-1 drug resistance mutations (HDRMs) is one of the major reasons for the clinical failure of antiretroviral therapy. Treatment success rates can be improved by applying personalized anti-HIV regimens based on a patient's HDRM profile. However, the sensitivity and specificity of the HDRM profile is limited by the methods used for detection. Sanger-based sequencing technology has traditionally been used for determining HDRM profiles at the single nucleotide variant (SNV) level, but with a sensitivity of only ≥ 20% in the HIV population of a patient. Next Generation Sequencing (NGS) technologies offer greater detection sensitivity (~ 1%) and larger scope (hundreds of samples per run). However, NGS technologies produce reads that are too short to enable the detection of the physical linkages of individual SNVs across the haplotype of each HIV strain present. In this article, we demonstrate that the single-molecule long reads generated using the Third Generation Sequencer (TGS), PacBio RS II, along with the appropriate bioinformatics analysis method, can resolve the HDRM profile at a more advanced quasispecies level. The case studies on patients' HIV samples showed that the quasispecies view produced using the PacBio method offered greater detection sensitivity and was more comprehensive for understanding HDRM situations, which is complement to both Sanger and NGS technologies. In conclusion, the PacBio method, providing a promising new quasispecies level of HDRM profiling, may effect an important change in the field of HIV drug resistance research.
Collapse
Affiliation(s)
- Da Wei Huang
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
- National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Castle Raley
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Min Kang Jiang
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Xin Zheng
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Dun Liang
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - M Tauseef Rehman
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Helene C. Highbarger
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Xiaoli Jiao
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Brad Sherman
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Liang Ma
- Critical Care Medicine Department, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Xiaofeng Chen
- Advanced Biomedical Computing Center, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Thomas Skelly
- Advanced Biomedical Computing Center, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Jennifer Troyer
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
- National Human Genome Research Institute, National Institutes of Health, Rockville, MD, 20852, USA
| | - Robert Stephens
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Tomozumi Imamichi
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Alice Pau
- Division of Clinical Research, National Institute of Allergy & Infectious Diseases, USA
| | - Richard A Lempicki
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Bao Tran
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - Dwight Nissley
- Cancer Research Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| | - H Clifford Lane
- Division of Clinical Research, National Institute of Allergy & Infectious Diseases, USA
| | - Robin L. Dewar
- Applied and Developmental Research Directorate, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, MD 21702, USA
| |
Collapse
|
44
|
Ode H, Matsuda M, Matsuoka K, Hachiya A, Hattori J, Kito Y, Yokomaku Y, Iwatani Y, Sugiura W. Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq. Front Microbiol 2015; 6:1258. [PMID: 26617593 PMCID: PMC4641896 DOI: 10.3389/fmicb.2015.01258] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Accepted: 10/29/2015] [Indexed: 12/29/2022] Open
Abstract
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome.
Collapse
Affiliation(s)
- Hirotaka Ode
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Masakazu Matsuda
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Kazuhiro Matsuoka
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Atsuko Hachiya
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Junko Hattori
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Yumiko Kito
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Yoshiyuki Yokomaku
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan
| | - Yasumasa Iwatani
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan ; Department of AIDS Research, Graduate School of Medicine, Nagoya University Nagoya, Japan
| | - Wataru Sugiura
- Department of Infectious Diseases and Immunology, Clinical Research Center, National Hospital Organization Nagoya Medical Center Nagoya, Japan ; Department of AIDS Research, Graduate School of Medicine, Nagoya University Nagoya, Japan
| |
Collapse
|
45
|
Abstract
PURPOSE OF REVIEW The review discusses new technologies for the sensitive detection of HIV drug resistance, with a focus on applications in antiretroviral treatment (ART)-naïve populations. RECENT FINDINGS Conventional sequencing is well established for detecting HIV drug resistance in routine care and guides optimal treatment selection in patients starting ART. Access to conventional sequencing is nearly universal in Western countries, but remains limited in Asia, Latin America, and Africa. Technological advances now allow detection of resistance with greatly improved sensitivity compared with conventional sequencing, variably increasing the yield of resistance testing in ART-naïve populations. There is strong cumulative evidence from retrospective studies that sensitive detection of resistant mutants in baseline plasma samples lacking resistance by conventional sequencing more than doubles the risk of virological failure after starting efavirenz-based or nevirapine-based ART. SUMMARY Sensitive resistance testing methods are mainly confined to research applications and in this context have provided great insight into the dynamics of drug resistance development, persistence, and transmission. Adoption in care settings is becoming increasingly possible, although important challenges remain. Platforms for diagnostic use must undergo technical improvements to ensure good performance and ease of use, and clinical validation is required to ensure utility.
Collapse
|
46
|
Messenger LA, Miles MA, Bern C. Between a bug and a hard place: Trypanosoma cruzi genetic diversity and the clinical outcomes of Chagas disease. Expert Rev Anti Infect Ther 2015; 13:995-1029. [PMID: 26162928 PMCID: PMC4784490 DOI: 10.1586/14787210.2015.1056158] [Citation(s) in RCA: 127] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Over the last 30 years, concomitant with successful transnational disease control programs across Latin America, Chagas disease has expanded from a neglected, endemic parasitic infection of the rural poor to an urbanized chronic disease, and now a potentially emergent global health problem. Trypanosoma cruzi infection has a highly variable clinical course, ranging from complete absence of symptoms to severe and often fatal cardiovascular and/or gastrointestinal manifestations. To date, few correlates of clinical disease progression have been identified. Elucidating a putative role for T. cruzi strain diversity in Chagas disease pathogenesis is complicated by the scarcity of parasites in clinical specimens and the limitations of our contemporary genotyping techniques. This article systematically reviews the historical literature, given our current understanding of parasite genetic diversity, to evaluate the evidence for any association between T. cruzi genotype and chronic clinical outcome, risk of congenital transmission or reactivation and orally transmitted outbreaks.
Collapse
Affiliation(s)
- Louisa A Messenger
- Department of Pathogen Molecular Biology, Faculty of Infectious Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Michael A Miles
- Department of Pathogen Molecular Biology, Faculty of Infectious Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Caryn Bern
- Global Health Sciences, Department of Epidemiology and Biostatistics, School of Medicine, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
47
|
Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW, Schupp JM, Keim P, Morrow JB, Salit ML, Zook JM. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 2015. [PMID: 26217378 PMCID: PMC4493402 DOI: 10.3389/fgene.2015.00235] [Citation(s) in RCA: 109] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.
Collapse
Affiliation(s)
- Nathan D Olson
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Steven P Lund
- Statistical Engineering Division, Information Technology Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Rebecca E Colman
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Jeffrey T Foster
- Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jason W Sahl
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - James M Schupp
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Paul Keim
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jayne B Morrow
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Marc L Salit
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA ; Department of Bioengineering, Stanford University , Stanford, CA, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| |
Collapse
|
48
|
Abstract
The majority of new and existing cases of HCV infection in high-income countries occur among people who inject drugs (PWID). Ongoing high-risk behaviours can lead to HCV re-exposure, resulting in mixed HCV infection and reinfection. Assays used to screen for mixed infection vary widely in sensitivity, particularly with respect to their capacity for detecting minor variants (<20% of the viral population). The prevalence of mixed infection among PWID ranges from 14% to 39% when sensitive assays are used. Mixed infection compromises HCV treatment outcomes with interferon-based regimens. HCV reinfection can also occur after successful interferon-based treatment among PWID, but the rate of reinfection is low (0-5 cases per 100 person-years). A revolution in HCV therapeutic development has occurred in the past few years, with the advent of interferon-free, but still genotype-specific regiments based on direct acting antiviral agents. However, little is known about whether mixed infection and reinfection has an effect on HCV treatment outcomes in the setting of new direct-acting antiviral agents. This Review characterizes the epidemiology and natural history of mixed infection and reinfection among PWID, methodologies for detection, the potential implications for HCV treatment and considerations for the design of future studies.
Collapse
|
49
|
Ho CKY, Welkers MRA, Thomas XV, Sullivan JC, Kieffer TL, Reesink HW, Rebers SPH, de Jong MD, Schinkel J, Molenkamp R. A comparison of 454 sequencing and clonal sequencing for the characterization of hepatitis C virus NS3 variants. J Virol Methods 2015; 219:28-37. [PMID: 25818622 DOI: 10.1016/j.jviromet.2015.03.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Revised: 03/17/2015] [Accepted: 03/18/2015] [Indexed: 01/09/2023]
Abstract
We compared 454 amplicon sequencing with clonal sequencing for the characterization of intra-host hepatitis C virus (HCV) NS3 variants. Clonal and 454 sequences were obtained from 12 patients enrolled in a clinical phase I study for telaprevir, an NS3-4a protease inhibitor. Thirty-nine datasets were used to compare the consensus sequence, average pairwise distance, normalized Shannon entropy, phylogenetic tree topology and the number and frequency of variants derived from both sequencing techniques. In general, a good concordance was observed between both techniques for the majority of datasets. Discordant results were observed for 5 out of 39 clonal and 454 datasets, which could be attributed to primer-related selective amplification used for clonal sequencing. Both 454 and clonal datasets consisted of a few major variants and a large number of low-frequency variants. Telaprevir resistance-associated variants were observed in low frequencies and were detected more often by 454. We conclude that performance of 454 and clonal sequencing is comparable for the characterization of intra-host virus populations. Not surprisingly, 454 is superior for the detection of low frequency resistance-associated variants. However, despite the greater coverage, 454 failed to detect some low frequency variants detected by clonal sequencing.
Collapse
Affiliation(s)
- Cynthia K Y Ho
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - Matthijs R A Welkers
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - Xiomara V Thomas
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - James C Sullivan
- Department of Infectious Diseases, Vertex Pharmaceuticals Incorporated, Cambridge, MA 02139, USA.
| | - Tara L Kieffer
- Department of Infectious Diseases, Vertex Pharmaceuticals Incorporated, Cambridge, MA 02139, USA.
| | - Henk W Reesink
- Department of Gastroenterology and Hepatology, Academic Medical Center, Amsterdam 1104 AZ, The Netherlands.
| | - Sjoerd P H Rebers
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - Menno D de Jong
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - Janke Schinkel
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| | - Richard Molenkamp
- Department of Medical Microbiology, Academic Medical Center, Amsterdam 1105 AZ, The Netherlands.
| |
Collapse
|
50
|
Kortenhoeven C, Joubert F, Bastos ADS, Abolnik C. Virus genome dynamics under different propagation pressures: reconstruction of whole genome haplotypes of West Nile viruses from NGS data. BMC Genomics 2015; 16:118. [PMID: 25766117 PMCID: PMC4338619 DOI: 10.1186/s12864-015-1340-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 02/12/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Extensive focus is placed on the comparative analyses of consensus genotypes in the study of West Nile virus (WNV) emergence. Few studies account for genetic change in the underlying WNV quasispecies population variants. These variants are not discernable in the consensus genome at the time of emergence, and the maintenance of mutation-selection equilibria of population variants is greatly underestimated. The emergence of lineage 1 WNV strains has been studied extensively, but recent epidemics caused by lineage 2 WNV strains in Hungary, Austria, Greece and Italy emphasizes the increasing importance of this lineage to public health. In this study we explored the quasispecies dynamics of minority variants that contribute to cell-tropism and host determination, i.e. the ability to infect different cell types or cells from different species from Next Generation Sequencing (NGS) data of a historic lineage 2 WNV strain. RESULTS Minority variants contributing to host cell membrane association persist in the viral population without contributing to the genetic change in the consensus genome. Minority variants are shown to maintain a stable mutation-selection equilibrium under positive selection, particularly in the capsid gene region. CONCLUSIONS This study is the first to infer positive selection and the persistence of WNV haplotype variants that contribute to viral fitness without accompanying genetic change in the consensus genotype, documented solely from NGS sequence data. The approach used in this study streamlines the experimental design seeking viral minority variants accurately from NGS data whilst minimizing the influence of associated sequence error.
Collapse
Affiliation(s)
- Cornell Kortenhoeven
- Poultry Section, Department of Production Animal Studies, Faculty of Veterinary Science, University of Pretoria, Old Soutpan Road, Onderstepoort, 0110, South Africa.
- Department of Zoology and Entomology, Faculty of Natural and Agricultural Sciences, Mammal Research Institute, University of Pretoria, Lynwood Road, Pretoria, South Africa.
- ARC-Ondestepoort Veterinary Institute, 100 Old Soutpan Road, Onderstepoort, 0110, South Africa.
| | - Fourie Joubert
- Department of Biochemistry, Faculty of Natural and Agricultural Sciences, University of Pretoria, Lynwood Road, Pretoria, South Africa.
| | - Armanda D S Bastos
- Department of Zoology and Entomology, Faculty of Natural and Agricultural Sciences, Mammal Research Institute, University of Pretoria, Lynwood Road, Pretoria, South Africa.
| | - Celia Abolnik
- Poultry Section, Department of Production Animal Studies, Faculty of Veterinary Science, University of Pretoria, Old Soutpan Road, Onderstepoort, 0110, South Africa.
- ARC-Ondestepoort Veterinary Institute, 100 Old Soutpan Road, Onderstepoort, 0110, South Africa.
| |
Collapse
|