1
|
Deb S, Basu J, Choudhary M. An overview of next generation sequencing strategies and genomics tools used for tuberculosis research. J Appl Microbiol 2024; 135:lxae174. [PMID: 39003248 DOI: 10.1093/jambio/lxae174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 06/07/2024] [Accepted: 07/10/2024] [Indexed: 07/15/2024]
Abstract
Tuberculosis (TB) is a grave public health concern and is considered the foremost contributor to human mortality resulting from infectious disease. Due to the stringent clonality and extremely restricted genomic diversity, conventional methods prove inefficient for in-depth exploration of minor genomic variations and the evolutionary dynamics operating in Mycobacterium tuberculosis (M.tb) populations. Until now, the majority of reviews have primarily focused on delineating the application of whole-genome sequencing (WGS) in predicting antibiotic resistant genes, surveillance of drug resistance strains, and M.tb lineage classifications. Despite the growing use of next generation sequencing (NGS) and WGS analysis in TB research, there are limited studies that provide a comprehensive summary of there role in studying macroevolution, minor genetic variations, assessing mixed TB infections, and tracking transmission networks at an individual level. This highlights the need for systematic effort to fully explore the potential of WGS and its associated tools in advancing our understanding of TB epidemiology and disease transmission. We delve into the recent bioinformatics pipelines and NGS strategies that leverage various genetic features and simultaneous exploration of host-pathogen protein expression profile to decipher the genetic heterogeneity and host-pathogen interaction dynamics of the M.tb infections. This review highlights the potential benefits and limitations of NGS and bioinformatics tools and discusses their role in TB detection and epidemiology. Overall, this review could be a valuable resource for researchers and clinicians interested in NGS-based approaches in TB research.
Collapse
Affiliation(s)
- Sushanta Deb
- Department of Veterinary Microbiology and Pathology, College of Veterinary Medicine, Washington State University, Pullman 99164, WA, United States
- All India Institute of Medical Sciences, New Delhi 110029, India
| | - Jhinuk Basu
- Department of Clinical Immunology and Rheumatology, Kalinga Institute of Medical Sciences (KIMS), KIIT University, Bhubaneswar 751024, India
| | - Megha Choudhary
- All India Institute of Medical Sciences, New Delhi 110029, India
| |
Collapse
|
2
|
Delgado S, Somovilla P, Ferrer-Orta C, Martínez-González B, Vázquez-Monteagudo S, Muñoz-Flores J, Soria ME, García-Crespo C, de Ávila AI, Durán-Pastor A, Gadea I, López-Galíndez C, Moran F, Lorenzo-Redondo R, Verdaguer N, Perales C, Domingo E. Incipient functional SARS-CoV-2 diversification identified through neural network haplotype maps. Proc Natl Acad Sci U S A 2024; 121:e2317851121. [PMID: 38416684 DOI: 10.1073/pnas.2317851121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 01/08/2024] [Indexed: 03/01/2024] Open
Abstract
Since its introduction in the human population, SARS-CoV-2 has evolved into multiple clades, but the events in its intrahost diversification are not well understood. Here, we compare three-dimensional (3D) self-organized neural haplotype maps (SOMs) of SARS-CoV-2 from thirty individual nasopharyngeal diagnostic samples obtained within a 19-day interval in Madrid (Spain), at the time of transition between clades 19 and 20. SOMs have been trained with the haplotype repertoire present in the mutant spectra of the nsp12- and spike (S)-coding regions. Each SOM consisted of a dominant neuron (displaying the maximum frequency), surrounded by a low-frequency neuron cloud. The sequence of the master (dominant) neuron was either identical to that of the reference Wuhan-Hu-1 genome or differed from it at one nucleotide position. Six different deviant haplotype sequences were identified among the master neurons. Some of the substitutions in the neural clouds affected critical sites of the nsp12-nsp8-nsp7 polymerase complex and resulted in altered kinetics of RNA synthesis in an in vitro primer extension assay. Thus, the analysis has identified mutations that are relevant to modification of viral RNA synthesis, present in the mutant clouds of SARS-CoV-2 quasispecies. These mutations most likely occurred during intrahost diversification in several COVID-19 patients, during an initial stage of the pandemic, and within a brief time period.
Collapse
Affiliation(s)
- Soledad Delgado
- Departamento de Sistemas Informáticos, Escuela Técnica Superior de Ingeniería de Sistemas Informáticos, Universidad Politécnica de Madrid, Madrid 28031, Spain
| | - Pilar Somovilla
- Microbes in Health and Welfare Program, Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas, Madrid 28049, Spain
- Departamento de Biología Molecular, Universidad Autónoma de Madrid, Madrid 28049, Spain
| | - Cristina Ferrer-Orta
- Structural and Molecular Biology Department, Institut de Biología Molecular de Barcelona, Consejo Superior de Investigaciones Científicas, Barcelona 08028, Spain
| | - Brenda Martínez-González
- Department of Molecular and Cell Biology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid 28049, Spain
- Department of Clinical Microbiology, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid, Madrid 28040, Spain
| | - Sergi Vázquez-Monteagudo
- Structural and Molecular Biology Department, Institut de Biología Molecular de Barcelona, Consejo Superior de Investigaciones Científicas, Barcelona 08028, Spain
| | | | - María Eugenia Soria
- Microbes in Health and Welfare Program, Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas, Madrid 28049, Spain
- Department of Clinical Microbiology, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid, Madrid 28040, Spain
| | - Carlos García-Crespo
- Microbes in Health and Welfare Program, Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas, Madrid 28049, Spain
| | - Ana Isabel de Ávila
- Microbes in Health and Welfare Program, Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas, Madrid 28049, Spain
| | - Antoni Durán-Pastor
- Department of Molecular and Cell Biology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid 28049, Spain
| | - Ignacio Gadea
- Department of Clinical Microbiology, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid, Madrid 28040, Spain
| | - Cecilio López-Galíndez
- Unidad de Virología Molecular, Laboratorio de Referencia e Investigación en retrovirus, Centro Nacional de Microbiología, Instituto de salud Carlos III, Majadahonda 28222, Spain
| | - Federico Moran
- Departamento de Bioquímica y Biología Molecular, Universidad Complutense de Madrid, Madrid 28040, Spain
| | - Ramon Lorenzo-Redondo
- Department of Medicine, Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Center for Pathogen Genomics and Microbial Evolution, Northwestern University Havey Institute for Global Health, Chicago, IL 60611
| | - Nuria Verdaguer
- Structural and Molecular Biology Department, Institut de Biología Molecular de Barcelona, Consejo Superior de Investigaciones Científicas, Barcelona 08028, Spain
| | - Celia Perales
- Department of Molecular and Cell Biology, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, Madrid 28049, Spain
- Department of Clinical Microbiology, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid, Madrid 28040, Spain
| | - Esteban Domingo
- Microbes in Health and Welfare Program, Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Consejo Superior de Investigaciones Científicas, Madrid 28049, Spain
| |
Collapse
|
3
|
Senghore M, Read H, Oza P, Johnson S, Passarelli-Araujo H, Taylor BP, Ashley S, Grey A, Callendrello A, Lee R, Goddard MR, Lumley T, Hanage WP, Wiles S. Inferring bacterial transmission dynamics using deep sequencing genomic surveillance data. Nat Commun 2023; 14:6397. [PMID: 37907520 PMCID: PMC10618251 DOI: 10.1038/s41467-023-42211-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 09/27/2023] [Indexed: 11/02/2023] Open
Abstract
Identifying and interrupting transmission chains is important for controlling infectious diseases. One way to identify transmission pairs - two hosts in which infection was transmitted from one to the other - is using the variation of the pathogen within each single host (within-host variation). However, the role of such variation in transmission is understudied due to a lack of experimental and clinical datasets that capture pathogen diversity in both donor and recipient hosts. In this work, we assess the utility of deep-sequenced genomic surveillance (where genomic regions are sequenced hundreds to thousands of times) using a mouse transmission model involving controlled spread of the pathogenic bacterium Citrobacter rodentium from infected to naïve female animals. We observe that within-host single nucleotide variants (iSNVs) are maintained over multiple transmission steps and present a model for inferring the likelihood that a given pair of sequenced samples are linked by transmission. In this work we show that, beyond the presence and absence of within-host variants, differences arising in the relative abundance of iSNVs (allelic frequency) can infer transmission pairs more precisely. Our approach further highlights the critical role bottlenecks play in reserving the within-host diversity during transmission.
Collapse
Affiliation(s)
- Madikay Senghore
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA.
| | - Hannah Read
- Bioluminescent Superbugs Lab, Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand
| | - Priyali Oza
- Bioluminescent Superbugs Lab, Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand
| | - Sarah Johnson
- Bioluminescent Superbugs Lab, Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand
| | - Hemanoel Passarelli-Araujo
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Minas Gerais, Brazil
| | - Bradford P Taylor
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Stephen Ashley
- Bioluminescent Superbugs Lab, Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand
| | - Alex Grey
- Bioluminescent Superbugs Lab, Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand
| | - Alanna Callendrello
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Robyn Lee
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
- University of Toronto Dalla Lana School of Public Health, Toronto, ON, Canada
| | - Matthew R Goddard
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
- School of Life and Environmental Sciences, University of Lincoln, Lincoln, UK
| | - Thomas Lumley
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - William P Hanage
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - Siouxsie Wiles
- Bioluminescent Superbugs Lab, Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand.
- Te Pūnaha Matatini, Centre of Research Excellence in Complex Systems, Auckland, New Zealand.
| |
Collapse
|
4
|
Juyal A, Hosseini R, Novikov D, Grinshpon M, Zelikovsky A. Reconstruction of Viral Variants via Monte Carlo Clustering. J Comput Biol 2023; 30:1009-1018. [PMID: 37695837 PMCID: PMC10518690 DOI: 10.1089/cmb.2023.0154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2023] Open
Abstract
Identifying viral variants through clustering is essential for understanding the composition and structure of viral populations within and between hosts, which play a crucial role in disease progression and epidemic spread. This article proposes and validates novel Monte Carlo (MC) methods for clustering aligned viral sequences by minimizing either entropy or Hamming distance from consensuses. We validate these methods on four benchmarks: two SARS-CoV-2 interhost data sets and two HIV intrahost data sets. A parallelized version of our tool is scalable to very large data sets. We show that both entropy and Hamming distance-based MC clusterings discern the meaningful information from sequencing data. The proposed clustering methods consistently converge to similar clusterings across different runs. Finally, we show that MC clustering improves reconstruction of intrahost viral population from sequencing data.
Collapse
Affiliation(s)
- Akshay Juyal
- Department of Computer Science and Georgia State University, Atlanta, Georgia, USA
| | - Roya Hosseini
- Department of Computer Science and Georgia State University, Atlanta, Georgia, USA
| | - Daniel Novikov
- Department of Computer Science and Georgia State University, Atlanta, Georgia, USA
| | - Mark Grinshpon
- Department of Mathematics and Statistics, Georgia State University, Atlanta, Georgia, USA
| | - Alex Zelikovsky
- Department of Computer Science and Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
5
|
Ke Z, Vikalo H. Graph-Based Reconstruction and Analysis of Disease Transmission Networks Using Viral Genomic Data. J Comput Biol 2023. [PMID: 37347892 DOI: 10.1089/cmb.2022.0373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2023] Open
Abstract
Understanding the patterns of viral disease transmissions helps establish public health policies and aids in controlling and ending a disease outbreak. Classical methods for studying disease transmission dynamics that rely on epidemiological data, such as times of sample collection and duration of exposure intervals, struggle to provide desired insight due to limited informativeness of such data. A more precise characterization of disease transmissions may be acquired from sequencing data that reveal genetic distance between viral genomes in patient samples. Indeed, genetic distance between viral strains present in hosts contains valuable information about transmission history, thus motivating the design of methods that rely on genomic data to reconstruct a directed disease transmission network, detect transmission clusters, and identify significant network nodes (e.g., super-spreaders). In this article, we present a novel end-to-end framework for the analysis of viral transmissions utilizing viral genomic (sequencing) data. The proposed framework groups infected hosts into transmission clusters based on the reconstructed viral strains infecting them; the genetic distance between a pair of hosts is calculated using Earth Mover's Distance, and further used to infer transmission direction between the hosts. To quantify the significance of a host in the transmission network, the importance score is calculated by a graph convolutional autoencoder. The viral transmission network is represented by a directed minimum spanning tree utilizing the Edmond's algorithm modified to incorporate constraints on the importance scores of the hosts. The proposed framework outperforms state-of-the-art techniques for the analysis of viral transmission dynamics in several experiments on semiexperimental as well as experimental data.
Collapse
Affiliation(s)
- Ziqi Ke
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas, USA
| | - Haris Vikalo
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
6
|
Johnson PCD, Hägglund S, Näslund K, Meyer G, Taylor G, Orton RJ, Zohari S, Haydon DT, Valarcher JF. Evaluating the potential of whole-genome sequencing for tracing transmission routes in experimental infections and natural outbreaks of bovine respiratory syncytial virus. Vet Res 2022; 53:107. [PMID: 36510312 PMCID: PMC9746130 DOI: 10.1186/s13567-022-01127-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 09/09/2022] [Indexed: 12/14/2022] Open
Abstract
Bovine respiratory syncytial virus (BRSV) is a major cause of respiratory disease in cattle. Genomic sequencing can resolve phylogenetic relationships between virus populations, which can be used to infer transmission routes and potentially inform the design of biosecurity measures. Sequencing of short (<2000 nt) segments of the 15 000-nt BRSV genome has revealed geographic and temporal clustering of BRSV populations, but insufficient variation to distinguish viruses collected from herds infected close together in space and time. This study investigated the potential for whole-genome sequencing to reveal sufficient genomic variation for inferring transmission routes between herds. Next-generation sequencing (NGS) data were generated from experimental infections and from natural outbreaks in Jämtland and Uppsala counties in Sweden. Sufficient depth of coverage for analysis of consensus and sub-consensus sequence diversity was obtained from 47 to 20 samples respectively. Few (range: 0-6 polymorphisms across the six experiments) consensus-level polymorphisms were observed along experimental transmissions. A much higher level of diversity (146 polymorphic sites) was found among the consensus sequences from the outbreak samples. The majority (144/146) of polymorphisms were between rather than within counties, suggesting that consensus whole-genome sequences show insufficient spatial resolution for inferring direct transmission routes, but might allow identification of outbreak sources at the regional scale. By contrast, within-sample diversity was generally higher in the experimental than the outbreak samples. Analyses to infer known (experimental) and suspected (outbreak) transmission links from within-sample diversity data were uninformative. In conclusion, analysis of the whole-genome sequence of BRSV from experimental samples discriminated between circulating isolates from distant areas, but insufficient diversity was observed between closely related isolates to aid local transmission route inference.
Collapse
Affiliation(s)
- Paul C D Johnson
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, UK.
| | - Sara Hägglund
- HPIG. Unit of Ruminant Medicine. Department of Clinical Sciences, Swedish University of Agricultural Sciences (SLU), Uppsala, Sweden
| | - Katarina Näslund
- Department of Microbiology, National Veterinary Institute, SVA, Uppsala, Sweden
| | - Gilles Meyer
- IHAP, Université de Toulouse, INRAE, ENVT, Toulouse, France
| | | | - Richard J Orton
- MRC-University of Glasgow Centre for Virus Research, Glasgow, UK
| | - Siamak Zohari
- Department of Microbiology, National Veterinary Institute, SVA, Uppsala, Sweden
| | - Daniel T Haydon
- School of Biodiversity, One Health and Veterinary Medicine, University of Glasgow, Glasgow, UK
| | - Jean François Valarcher
- HPIG. Unit of Ruminant Medicine. Department of Clinical Sciences, Swedish University of Agricultural Sciences (SLU), Uppsala, Sweden
| |
Collapse
|
7
|
Quasispecies Fitness Partition to Characterize the Molecular Status of a Viral Population. Negative Effect of Early Ribavirin Discontinuation in a Chronically Infected HEV Patient. Int J Mol Sci 2022; 23:ijms232314654. [PMID: 36498981 PMCID: PMC9739305 DOI: 10.3390/ijms232314654] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 11/11/2022] [Accepted: 11/17/2022] [Indexed: 11/25/2022] Open
Abstract
The changes occurring in viral quasispecies populations during infection have been monitored using diversity indices, nucleotide diversity, and several other indices to summarize the quasispecies structure in a single value. In this study, we present a method to partition quasispecies haplotypes into four fractions according to their fitness: the master haplotype, rare haplotypes at two levels (those present at <0.1%, and those at 0.1−1%), and a fourth fraction that we term emerging haplotypes, present at frequencies >1%, but less than that of the master haplotype. We propose that by determining the changes occurring in the volume of the four quasispecies fitness fractions together with those of the Hill number profile we will be able to visualize and analyze the molecular changes in the composition of a quasispecies with time. To develop this concept, we used three data sets: a technical clone of the complete SARS-CoV-2 spike gene, a subset of data previously used in a study of rare haplotypes, and data from a clinical follow-up study of a patient chronically infected with HEV and treated with ribavirin. The viral response to ribavirin mutagenic treatment was selection of a rich set of synonymous haplotypes. The mutation spectrum was very complex at the nucleotide level, but at the protein (phenotypic/functional) level the pattern differed, showing a highly prevalent master phenotype. We discuss the putative implications of this observation in relation to mutagenic antiviral treatment.
Collapse
|
8
|
Chao E, Chato C, Vender R, Olabode AS, Ferreira RC, Poon AFY. Molecular source attribution. PLoS Comput Biol 2022; 18:e1010649. [PMID: 36395093 PMCID: PMC9671344 DOI: 10.1371/journal.pcbi.1010649] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Elisa Chao
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Connor Chato
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Reid Vender
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
- School of Medicine, Queen’s University, Kingston, Ontario, Canada
| | - Abayomi S. Olabode
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Roux-Cil Ferreira
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Art F. Y. Poon
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
- * E-mail:
| |
Collapse
|
9
|
Skums P, Mohebbi F, Tsyvina V, Baykal PI, Nemira A, Ramachandran S, Khudyakov Y. SOPHIE: Viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework. Cell Syst 2022; 13:844-856.e4. [PMID: 36265470 PMCID: PMC9590096 DOI: 10.1016/j.cels.2022.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 07/05/2022] [Accepted: 07/19/2022] [Indexed: 01/26/2023]
Abstract
Genomic epidemiology is now widely used for viral outbreak investigations. Still, this methodology faces many challenges. First, few methods account for intra-host viral diversity. Second, maximum parsimony principle continues to be employed for phylogenetic inference of transmission histories, even though maximum likelihood or Bayesian models are usually more consistent. Third, many methods utilize case-specific data, such as sampling times or infection exposure intervals. This impedes study of persistent infections in vulnerable groups, where such information has a limited use. Finally, most methods implicitly assume that transmission events are independent, although common source outbreaks violate this assumption. We propose a maximum likelihood framework, SOPHIE, based on the integration of phylogenetic and random graph models. It infers transmission networks from viral phylogenies and expected properties of inter-host social networks modeled as random graphs with given expected degree distributions. SOPHIE is scalable, accounts for intra-host diversity, and accurately infers transmissions without case-specific epidemiological data.
Collapse
Affiliation(s)
- Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, USA.
| | - Fatemeh Mohebbi
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Vyacheslav Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Pelin Icer Baykal
- Department of Biosystems Science & Engineering, ETH Zurich, Basel, Switzerland
| | - Alina Nemira
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Sumathi Ramachandran
- Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Yury Khudyakov
- Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| |
Collapse
|
10
|
Lundgren E, Romero-Severson E, Albert J, Leitner T. Combining biomarker and virus phylogenetic models improves HIV-1 epidemiological source identification. PLoS Comput Biol 2022; 18:e1009741. [PMID: 36026480 PMCID: PMC9455879 DOI: 10.1371/journal.pcbi.1009741] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 09/08/2022] [Accepted: 08/02/2022] [Indexed: 01/07/2023] Open
Abstract
To identify and stop active HIV transmission chains new epidemiological techniques are needed. Here, we describe the development of a multi-biomarker augmentation to phylogenetic inference of the underlying transmission history in a local population. HIV biomarkers are measurable biological quantities that have some relationship to the amount of time someone has been infected with HIV. To train our model, we used five biomarkers based on real data from serological assays, HIV sequence data, and target cell counts in longitudinally followed, untreated patients with known infection times. The biomarkers were modeled with a mixed effects framework to allow for patient specific variation and general trends, and fit to patient data using Markov Chain Monte Carlo (MCMC) methods. Subsequently, the density of the unobserved infection time conditional on observed biomarkers were obtained by integrating out the random effects from the model fit. This probabilistic information about infection times was incorporated into the likelihood function for the transmission history and phylogenetic tree reconstruction, informed by the HIV sequence data. To critically test our methodology, we developed a coalescent-based simulation framework that generates phylogenies and biomarkers given a specific or general transmission history. Testing on many epidemiological scenarios showed that biomarker augmented phylogenetics can reach 90% accuracy under idealized situations. Under realistic within-host HIV-1 evolution, involving substantial within-host diversification and frequent transmission of multiple lineages, the average accuracy was at about 50% in transmission clusters involving 5-50 hosts. Realistic biomarker data added on average 16 percentage points over using the phylogeny alone. Using more biomarkers improved the performance. Shorter temporal spacing between transmission events and increased transmission heterogeneity reduced reconstruction accuracy, but larger clusters were not harder to get right. More sequence data per infected host also improved accuracy. We show that the method is robust to incomplete sampling and that adding biomarkers improves reconstructions of real HIV-1 transmission histories. The technology presented here could allow for better prevention programs by providing data for locally informed and tailored strategies.
Collapse
Affiliation(s)
- Erik Lundgren
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Ethan Romero-Severson
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Jan Albert
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Microbiology, Karolinska University Hospital, Stockholm, Sweden
| | - Thomas Leitner
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- * E-mail:
| |
Collapse
|
11
|
Xi X, Spencer SEF, Hall M, Grabowski MK, Kagaayi J, Ratmann O. Inferring the sources of HIV infection in Africa from deep‐sequence data with semi‐parametric Bayesian Poisson flow models. J R Stat Soc Ser C Appl Stat 2022. [DOI: 10.1111/rssc.12544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Xiaoyue Xi
- Department of MathematicsImperial College London LondonUK
| | | | - Matthew Hall
- Big Data Institute, Nuffield Department of MedicineUniversity of Oxford OxfordUK
| | - M. Kate Grabowski
- Department of PathologyJohns Hopkins University BaltimoreMDUSA
- Rakai Health Sciences Program KalisizoUganda
| | - Joseph Kagaayi
- Rakai Health Sciences Program KalisizoUganda
- Makerere University School of Public Health KampalaUganda
| | - Oliver Ratmann
- Department of MathematicsImperial College London LondonUK
| | | |
Collapse
|
12
|
Guang A, Howison M, Ledingham L, D’Antuono M, Chan PA, Lawrence C, Dunn CW, Kantor R. Incorporating Within-Host Diversity in Phylogenetic Analyses for Detecting Clusters of New HIV Diagnoses. Front Microbiol 2022; 12:803190. [PMID: 35250908 PMCID: PMC8891961 DOI: 10.3389/fmicb.2021.803190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 12/22/2021] [Indexed: 11/29/2022] Open
Abstract
Background Phylogenetic analyses of HIV sequences are used to detect clusters and inform public health interventions. Conventional approaches summarize within-host HIV diversity with a single consensus sequence per host of the pol gene, obtained from Sanger or next-generation sequencing (NGS). There is growing recognition that this approach discards potentially important information about within-host sequence variation, which can impact phylogenetic inference. However, whether alternative summary methods that incorporate intra-host variation impact phylogenetic inference of transmission network features is unknown. Methods We introduce profile sampling, a method to incorporate within-host NGS sequence diversity into phylogenetic HIV cluster inference. We compare this approach to Sanger- and NGS-derived pol and near-whole-genome consensus sequences and evaluate its potential benefits in identifying molecular clusters among all newly-HIV-diagnosed individuals over six months at the largest HIV center in Rhode Island. Results Profile sampling cluster inference demonstrated that within-host viral diversity impacts phylogenetic inference across individuals, and that consensus sequence approaches can obscure both magnitude and effect of these impacts. Clustering differed between Sanger- and NGS-derived consensus and profile sampling sequences, and across gene regions. Discussion Profile sampling can incorporate within-host HIV diversity captured by NGS into phylogenetic analyses. This additional information can improve robustness of cluster detection.
Collapse
Affiliation(s)
- August Guang
- Center for Computational Biology of Human Disease, Brown University, Providence, RI, United States
- Center for Computation and Visualization, Brown University, Providence, RI, United States
- *Correspondence: August Guang,
| | - Mark Howison
- Research Improving People’s Lives, Providence, RI, United States
| | - Lauren Ledingham
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| | - Matthew D’Antuono
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| | - Philip A. Chan
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| | - Charles Lawrence
- Division of Applied Mathematics, Brown University, Providence, RI, United States
| | - Casey W. Dunn
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States
| | - Rami Kantor
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| |
Collapse
|
13
|
Xu R, Aranday-Cortes E, Leitch ECM, Hughes J, Singer JB, Sreenu V, Tong L, da Silva Filipe A, Bamford CGG, Rong X, Huang J, Wang M, Fu Y, McLauchlan J. The evolutionary dynamics and epidemiological history of hepatitis C virus genotype 6, including unique strains from the Li community of Hainan Island, China. Virus Evol 2022; 8:veac012. [PMID: 35600095 PMCID: PMC9115904 DOI: 10.1093/ve/veac012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 01/17/2022] [Accepted: 02/15/2022] [Indexed: 12/09/2022] Open
Abstract
Hepatitis C virus (HCV) is a highly diverse pathogen that frequently establishes a chronic long-term infection, but the origins and drivers of HCV diversity in the human population remain unclear. Previously unidentified strains of HCV genotype 6 (gt6) were recently discovered in chronically infected individuals of the Li ethnic group living in Baisha County, Hainan Island, China. The Li community, who were early settlers on Hainan Island, has a distinct host genetic background and cultural identity compared to other ethnic groups on the island and mainland China. In this report, we generated 33 whole virus genome sequences to conduct a comprehensive molecular epidemiological analysis of these novel gt6 strains in the context of gt6 isolates present in Southeast Asia. With the exception of one gt6a isolate, the Li gt6 sequences formed three novel clades from two lineages which constituted 3 newly assigned gt6 subtypes and 30 unassigned strains. Using Bayesian inference methods, we dated the most recent common ancestor for all available gt6 whole virus genome sequences to approximately 2767 bce (95 per cent highest posterior density (HPD) intervals, 3670-1397 bce), which is far earlier than previous estimates. The substitution rate was 1.20 × 10-4 substitutions/site/year (s/s/y), and this rate varied across the genome regions, from 1.02 × 10-5 s/s/y in the 5'untranslated region (UTR) region to 3.07 × 10-4 s/s/y in E2. Thus, our study on an isolated ethnic minority group within a small geographical area of Hainan Island has substantially increased the known diversity of HCV gt6, already acknowledged as the most diverse HCV genotype. The extant HCV gt6 sequences from this study were probably transmitted to the Li through at least three independent events dating perhaps from around 4,000 years ago. This analysis describes deeper insight into basic aspects of HCV gt6 molecular evolution including the extensive diversity of gt6 sequences in the isolated Li ethnic group.
Collapse
Affiliation(s)
| | - Elihu Aranday-Cortes
- MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, Garscube Campus, 464 Bearsden Road, Glasgow G61 1QH, UK
| | - E Carol McWilliam Leitch
- MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, Garscube Campus, 464 Bearsden Road, Glasgow G61 1QH, UK
| | - Joseph Hughes
- MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, Garscube Campus, 464 Bearsden Road, Glasgow G61 1QH, UK
| | - Joshua B Singer
- MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, Garscube Campus, 464 Bearsden Road, Glasgow G61 1QH, UK
| | - Vattipally Sreenu
- MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, Garscube Campus, 464 Bearsden Road, Glasgow G61 1QH, UK
| | - Lily Tong
- MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, Garscube Campus, 464 Bearsden Road, Glasgow G61 1QH, UK
| | - Ana da Silva Filipe
- MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, Garscube Campus, 464 Bearsden Road, Glasgow G61 1QH, UK
| | - Connor G G Bamford
- MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, Garscube Campus, 464 Bearsden Road, Glasgow G61 1QH, UK
| | - Xia Rong
- Guangzhou Blood Center, Institute of Clinical Blood Transfusion, Guangzhou Blood Center, 31 LuYuan Road, Guangzhou, Guangdong 510095, P.R. China
| | - Jieting Huang
- Guangzhou Blood Center, Institute of Clinical Blood Transfusion, Guangzhou Blood Center, 31 LuYuan Road, Guangzhou, Guangdong 510095, P.R. China
| | - Min Wang
- Guangzhou Blood Center, Institute of Clinical Blood Transfusion, Guangzhou Blood Center, 31 LuYuan Road, Guangzhou, Guangdong 510095, P.R. China
| | - Yongshui Fu
- Guangzhou Blood Center, Institute of Clinical Blood Transfusion, Guangzhou Blood Center, 31 LuYuan Road, Guangzhou, Guangdong 510095, P.R. China
| | - John McLauchlan
- MRC-University of Glasgow Centre for Virus Research, Sir Michael Stoker Building, Garscube Campus, 464 Bearsden Road, Glasgow G61 1QH, UK
- Guangzhou Blood Center, Institute of Clinical Blood Transfusion, Guangzhou Blood Center, 31 LuYuan Road, Guangzhou, Guangdong 510095, P.R. China
| |
Collapse
|
14
|
Gussler JW, Campo DS, Dimitrova Z, Skums P, Khudyakov Y. Primary case inference in viral outbreaks through analysis of intra-host variant population. BMC Bioinformatics 2022; 23:62. [PMID: 35135469 PMCID: PMC8822801 DOI: 10.1186/s12859-022-04585-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 01/25/2022] [Indexed: 11/21/2022] Open
Abstract
Background Investigation of outbreaks to identify the primary case is crucial for the interruption and prevention of transmission of infectious diseases. These individuals may have a higher risk of participating in near future transmission events when compared to the other patients in the outbreak, so directing more transmission prevention resources towards these individuals is a priority. Although the genetic characterization of intra-host viral populations can aid the identification of transmission clusters, it is not trivial to determine the directionality of transmissions during outbreaks, owing to complexity of viral evolution. Here, we present a new computational framework, PYCIVO: primary case inference in viral outbreaks. This framework expands upon our earlier work in development of QUENTIN, which builds a probabilistic disease transmission tree based on simulation of evolution of intra-host hepatitis C virus (HCV) variants between cases involved in direct transmission during an outbreak. PYCIVO improves upon QUENTIN by also adding a custom heterogeneity index and identifying the scenario when the primary case may have not been sampled. Results These approaches were validated using a set of 105 sequence samples from 11 distinct HCV transmission clusters identified during outbreak investigations, in which the primary case was epidemiologically verified. Both models can detect the correct primary case in 9 out of 11 transmission clusters (81.8%). However, while QUENTIN issues erroneous predictions on the remaining 2 transmission clusters, PYCIVO issues a null output for these clusters, giving it an effective prediction accuracy of 100%. To further evaluate accuracy of the inference, we created 10 modified transmission clusters in which the primary case had been removed. In this scenario, PYCIVO was able to correctly identify that there was no primary case in 8/10 (80%) of these modified clusters. This model was validated with HCV; however, this approach may be applicable to other microbial pathogens. Conclusions PYCIVO improves upon QUENTIN by also implementing a custom heterogeneity index which empowers PYCIVO to make the important ‘No primary case’ prediction. One or more samples, possibly including the primary case, may have not been sampled, and this designation is meant to account for these scenarios.
Collapse
Affiliation(s)
- J Walker Gussler
- Centers for Disease Control and Prevention, 1600 Clifton Rd, Atlanta, GA, 30333, USA.,Department of Computer Science, Georgia State University, 1 Park Place NE, Atlanta, GA, 30303, USA
| | - David S Campo
- Centers for Disease Control and Prevention, 1600 Clifton Rd, Atlanta, GA, 30333, USA.
| | - Zoya Dimitrova
- Centers for Disease Control and Prevention, 1600 Clifton Rd, Atlanta, GA, 30333, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 1 Park Place NE, Atlanta, GA, 30303, USA
| | - Yury Khudyakov
- Centers for Disease Control and Prevention, 1600 Clifton Rd, Atlanta, GA, 30333, USA
| |
Collapse
|
15
|
Dhar S, Zhang C, Măndoiu II, Bansal MS. TNet: Transmission Network Inference Using Within-Host Strain Diversity and its Application to Geographical Tracking of COVID-19 Spread. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:230-242. [PMID: 34255632 PMCID: PMC8956368 DOI: 10.1109/tcbb.2021.3096455] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 07/03/2021] [Accepted: 07/08/2021] [Indexed: 06/13/2023]
Abstract
The inference of disease transmission networks is an important problem in epidemiology. One popular approach for building transmission networks is to reconstruct a phylogenetic tree using sequences from disease strains sampled from infected hosts and infer transmissions based on this tree. However, most existing phylogenetic approaches for transmission network inference are highly computationally intensive and cannot take within-host strain diversity into account. Here, we introduce a new phylogenetic approach for inferring transmission networks, TNet, that addresses these limitations. TNet uses multiple strain sequences from each sampled host to infer transmissions and is simpler and more accurate than existing approaches. Furthermore, TNet is highly scalable and able to distinguish between ambiguous and unambiguous transmission inferences. We evaluated TNet on a large collection of 560 simulated transmission networks of various sizes and diverse host, sequence, and transmission characteristics, as well as on 10 real transmission datasets with known transmission histories. Our results show that TNet outperforms two other recently developed methods, phyloscanner and SharpTNI, that also consider within-host strain diversity. We also applied TNet to a large collection of SARS-CoV-2 genomes sampled from infected individuals in many countries around the world, demonstrating how our inference framework can be adapted to accurately infer geographical transmission networks. TNet is freely available from https://compbio.engr.uconn.edu/software/TNet/.
Collapse
Affiliation(s)
- Saurav Dhar
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Chengchen Zhang
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Ion I. Măndoiu
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Mukul S. Bansal
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| |
Collapse
|
16
|
Mäklin T, Kallonen T, Alanko J, Samuelsen Ø, Hegstad K, Mäkinen V, Corander J, Heinz E, Honkela A. Bacterial genomic epidemiology with mixed samples. Microb Genom 2021; 7:000691. [PMID: 34779765 PMCID: PMC8743562 DOI: 10.1099/mgen.0.000691] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 09/13/2021] [Indexed: 11/18/2022] Open
Abstract
Genomic epidemiology is a tool for tracing transmission of pathogens based on whole-genome sequencing. We introduce the mGEMS pipeline for genomic epidemiology with plate sweeps representing mixed samples of a target pathogen, opening the possibility to sequence all colonies on selective plates with a single DNA extraction and sequencing step. The pipeline includes the novel mGEMS read binner for probabilistic assignments of sequencing reads, and the scalable pseudoaligner Themisto. We demonstrate the effectiveness of our approach using closely related samples in a nosocomial setting, obtaining results that are comparable to those based on single-colony picks. Our results lend firm support to more widespread consideration of genomic epidemiology with mixed infection samples.
Collapse
Affiliation(s)
- Tommi Mäklin
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Teemu Kallonen
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Jarno Alanko
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Ørjan Samuelsen
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
- Department of Pharmacy, UT The Arctic University of Norway, Tromsø, Norway
| | - Kristin Hegstad
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
- Research group for Host-Microbe Interactions, Department of Medical Biology, Faculty of Health Sciences, UT The Arctic University of Norway, Tromsø, Norway
| | - Veli Mäkinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Jukka Corander
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Eva Heinz
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Liverpool School of Tropical Medicine, Liverpool, UK
| | - Antti Honkela
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
17
|
Abstract
Viral quasispecies are dynamic distributions of nonidentical but closely related mutant and recombinant viral genomes subjected to a continuous process of genetic variation, competition, and selection that may act as a unit of selection. The quasispecies concept owes its theoretical origins to a model for the origin of life as a collection of mutant RNA replicators. Independently, experimental evidence for the quasispecies concept was obtained from sampling of bacteriophage clones, which revealed that the viral populations consisted of many mutant genomes whose frequency varied with time of replication. Similar findings were made in animal and plant RNA viruses. Quasispecies became a theoretical framework to understand viral population dynamics and adaptability. The evidence came at a time when mutations were considered rare events in genetics, a perception that was to change dramatically in subsequent decades. Indeed, viral quasispecies was the conceptual forefront of a remarkable degree of biological diversity, now evident for cell populations and organisms, not only for viruses. Quasispecies dynamics unveiled complexities in the behavior of viral populations,with consequences for disease mechanisms and control strategies. This review addresses the origin of the quasispecies concept, its major implications on both viral evolution and antiviral strategies, and current and future prospects.
Collapse
Affiliation(s)
- Esteban Domingo
- Department of Interactions with the Environment, Centro de Biología Molecular Severo Ochoa (CBMSO), Consejo Superior de Investigaciones Científicas (CSIC), 28049 Madrid, Spain; .,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - Carlos García-Crespo
- Department of Interactions with the Environment, Centro de Biología Molecular Severo Ochoa (CBMSO), Consejo Superior de Investigaciones Científicas (CSIC), 28049 Madrid, Spain;
| | - Celia Perales
- Department of Interactions with the Environment, Centro de Biología Molecular Severo Ochoa (CBMSO), Consejo Superior de Investigaciones Científicas (CSIC), 28049 Madrid, Spain; .,Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain.,Department of Clinical Microbiology, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28040 Madrid, Spain
| |
Collapse
|
18
|
Orlovich Y, Kukharenko K, Kaibel V, Skums P. Scale-Free Spanning Trees and Their Application in Genomic Epidemiology. J Comput Biol 2021; 28:945-960. [PMID: 34491104 PMCID: PMC8670573 DOI: 10.1089/cmb.2020.0500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
We study the algorithmic problem of finding the most “scale-free-like” spanning tree of a connected graph. This problem is motivated by the fundamental problem of genomic epidemiology: given viral genomes sampled from infected individuals, reconstruct the transmission network (“who infected whom”). We use two possible objective functions for this problem and introduce the corresponding algorithmic problems termedm-SF (-scale free) ands-SF Spanning Tree problems. We prove that those problems are APX- and NP-hard, respectively, even in the classes of cubic and bipartite graphs. We propose two integer linear programming (ILP) formulations for thes-SF Spanning Tree problem, and experimentally assess its performance using simulated and experimental data. In particular, we demonstrate that the ILP-based approach allows for accurate reconstruction of transmission histories of several hepatitis C outbreaks.
Collapse
Affiliation(s)
- Yury Orlovich
- Faculty of Applied Mathematics and Computer Science, Belarusian State University, Minsk, Belarus
| | - Kirill Kukharenko
- Institute for Mathematical Optimization, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Volker Kaibel
- Institute for Mathematical Optimization, Otto von Guericke University Magdeburg, Magdeburg, Germany
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
19
|
Berry IM, Melendrez MC, Pollett S, Figueroa K, Buddhari D, Klungthong C, Nisalak A, Panciera M, Thaisomboonsuk B, Li T, Vallard TG, Macareo L, Yoon IK, Thomas SJ, Endy T, Jarman RG. Precision Tracing of Household Dengue Spread Using Inter- and Intra-Host Viral Variation Data, Kamphaeng Phet, Thailand. Emerg Infect Dis 2021; 27:1637-1644. [PMID: 34013878 PMCID: PMC8153871 DOI: 10.3201/eid2706.204323] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Dengue control approaches are best informed by granular spatial epidemiology of these viruses, yet reconstruction of inter- and intra-household transmissions is limited when analyzing case count, serologic, or genomic consensus sequence data. To determine viral spread on a finer spatial scale, we extended phylogenomic discrete trait analyses to reconstructions of house-to-house transmissions within a prospective cluster study in Kamphaeng Phet, Thailand. For additional resolution and transmission confirmation, we mapped dengue intra-host single nucleotide variants on the taxa of these time-scaled phylogenies. This approach confirmed 19 household transmissions and revealed that dengue disperses an average of 70 m per day between households in these communities. We describe an evolutionary biology framework for the resolution of dengue transmissions that cannot be differentiated based on epidemiologic and consensus genome data alone. This framework can be used as a public health tool to inform control approaches and enable precise tracing of dengue transmissions.
Collapse
|
20
|
Knyazev S, Tsyvina V, Shankar A, Melnyk A, Artyomenko A, Malygina T, Porozov YB, Campbell EM, Switzer WM, Skums P, Mangul S, Zelikovsky A. Accurate assembly of minority viral haplotypes from next-generation sequencing through efficient noise reduction. Nucleic Acids Res 2021; 49:e102. [PMID: 34214168 PMCID: PMC8464054 DOI: 10.1093/nar/gkab576] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 05/25/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open
Abstract
Rapidly evolving RNA viruses continuously produce minority haplotypes that can become dominant if they are drug-resistant or can better evade the immune system. Therefore, early detection and identification of minority viral haplotypes may help to promptly adjust the patient’s treatment plan preventing potential disease complications. Minority haplotypes can be identified using next-generation sequencing, but sequencing noise hinders accurate identification. The elimination of sequencing noise is a non-trivial task that still remains open. Here we propose CliqueSNV based on extracting pairs of statistically linked mutations from noisy reads. This effectively reduces sequencing noise and enables identifying minority haplotypes with the frequency below the sequencing error rate. We comparatively assess the performance of CliqueSNV using an in vitro mixture of nine haplotypes that were derived from the mutation profile of an existing HIV patient. We show that CliqueSNV can accurately assemble viral haplotypes with frequencies as low as 0.1% and maintains consistent performance across short and long bases sequencing platforms.
Collapse
Affiliation(s)
- Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA.,Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, USA
| | - Viachaslau Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Anupama Shankar
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Andrew Melnyk
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | | | - Tatiana Malygina
- International Scientific and Research Institute of Bioengineering, ITMO University, St. Petersburg 197101, Russia
| | - Yuri B Porozov
- World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia.,Department of Computational Biology, Sirius University of Science and Technology, Sochi 354340, Russia
| | - Ellsworth M Campbell
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - William M Switzer
- Division of HIV Prevention, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA 90089, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.,World-Class Research Center "Digital biodesign and personalized healthcare", I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| |
Collapse
|
21
|
Valesano AL, Rumfelt KE, Dimcheff DE, Blair CN, Fitzsimmons WJ, Petrie JG, Martin ET, Lauring AS. Temporal dynamics of SARS-CoV-2 mutation accumulation within and across infected hosts. PLoS Pathog 2021; 17:e1009499. [PMID: 33826681 PMCID: PMC8055005 DOI: 10.1371/journal.ppat.1009499] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 04/19/2021] [Accepted: 03/24/2021] [Indexed: 01/12/2023] Open
Abstract
Analysis of SARS-CoV-2 genetic diversity within infected hosts can provide insight into the generation and spread of new viral variants and may enable high resolution inference of transmission chains. However, little is known about temporal aspects of SARS-CoV-2 intrahost diversity and the extent to which shared diversity reflects convergent evolution as opposed to transmission linkage. Here we use high depth of coverage sequencing to identify within-host genetic variants in 325 specimens from hospitalized COVID-19 patients and infected employees at a single medical center. We validated our variant calling by sequencing defined RNA mixtures and identified viral load as a critical factor in variant identification. By leveraging clinical metadata, we found that intrahost diversity is low and does not vary by time from symptom onset. This suggests that variants will only rarely rise to appreciable frequency prior to transmission. Although there was generally little shared variation across the sequenced cohort, we identified intrahost variants shared across individuals who were unlikely to be related by transmission. These variants did not precede a rise in frequency in global consensus genomes, suggesting that intrahost variants may have limited utility for predicting future lineages. These results provide important context for sequence-based inference in SARS-CoV-2 evolution and epidemiology.
Collapse
Affiliation(s)
- Andrew L. Valesano
- Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Kalee E. Rumfelt
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Derek E. Dimcheff
- Division of Hospital Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Christopher N. Blair
- Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - William J. Fitzsimmons
- Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Joshua G. Petrie
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Emily T. Martin
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Adam S. Lauring
- Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
22
|
Ramazzotti D, Angaroni F, Maspero D, Gambacorti-Passerini C, Antoniotti M, Graudenzi A, Piazza R. VERSO: A comprehensive framework for the inference of robust phylogenies and the quantification of intra-host genomic diversity of viral samples. PATTERNS (NEW YORK, N.Y.) 2021; 2:100212. [PMID: 33728416 PMCID: PMC7953447 DOI: 10.1016/j.patter.2021.100212] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 11/30/2020] [Accepted: 01/22/2021] [Indexed: 12/22/2022]
Abstract
We introduce VERSO, a two-step framework for the characterization of viral evolution from sequencing data of viral genomes, which is an improvement on phylogenomic approaches for consensus sequences. VERSO exploits an efficient algorithmic strategy to return robust phylogenies from clonal variant profiles, also in conditions of sampling limitations. It then leverages variant frequency patterns to characterize the intra-host genomic diversity of samples, revealing undetected infection chains and pinpointing variants likely involved in homoplasies. On simulations, VERSO outperforms state-of-the-art tools for phylogenetic inference. Notably, the application to 6,726 amplicon and RNA sequencing samples refines the estimation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution, while co-occurrence patterns of minor variants unveil undetected infection paths, which are validated with contact tracing data. Finally, the analysis of SARS-CoV-2 mutational landscape uncovers a temporal increase of overall genomic diversity and highlights variants transiting from minor to clonal state and homoplastic variants, some of which fall on the spike gene. Available at: https://github.com/BIMIB-DISCo/VERSO.
Collapse
Affiliation(s)
- Daniele Ramazzotti
- Department of Medicine and Surgery, Università degli Studi di Milano-Bicocca, Monza, Italy
| | - Fabrizio Angaroni
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Davide Maspero
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
| | | | - Marco Antoniotti
- Department of Informatics, Systems and Communication, Università degli Studi di Milano-Bicocca, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
| | - Alex Graudenzi
- Inst. of Molecular Bioimaging and Physiology, Consiglio Nazionale delle Ricerche (IBFM-CNR), Segrate, Milan, Italy
- Bicocca Bioinformatics, Biostatistics and Bioimaging Centre – B4, Milan, Italy
| | - Rocco Piazza
- Department of Medicine and Surgery, Università degli Studi di Milano-Bicocca, Monza, Italy
| |
Collapse
|
23
|
Maljkovic Berry I, Melendrez MC, Bishop-Lilly KA, Rutvisuttinunt W, Pollett S, Talundzic E, Morton L, Jarman RG. Next Generation Sequencing and Bioinformatics Methodologies for Infectious Disease Research and Public Health: Approaches, Applications, and Considerations for Development of Laboratory Capacity. J Infect Dis 2021; 221:S292-S307. [PMID: 31612214 DOI: 10.1093/infdis/jiz286] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Next generation sequencing (NGS) combined with bioinformatics has successfully been used in a vast array of analyses for infectious disease research of public health relevance. For instance, NGS and bioinformatics approaches have been used to identify outbreak origins, track transmissions, investigate epidemic dynamics, determine etiological agents of a disease, and discover novel human pathogens. However, implementation of high-quality NGS and bioinformatics in research and public health laboratories can be challenging. These challenges mainly include the choice of the sequencing platform and the sequencing approach, the choice of bioinformatics methodologies, access to the appropriate computation and information technology infrastructure, and recruiting and retaining personnel with the specialized skills and experience in this field. In this review, we summarize the most common NGS and bioinformatics workflows in the context of infectious disease genomic surveillance and pathogen discovery, and highlight the main challenges and considerations for setting up an NGS and bioinformatics-focused infectious disease research public health laboratory. We describe the most commonly used sequencing platforms and review their strengths and weaknesses. We review sequencing approaches that have been used for various pathogens and study questions, as well as the most common difficulties associated with these approaches that should be considered when implementing in a public health or research setting. In addition, we provide a review of some common bioinformatics tools and procedures used for pathogen discovery and genome assembly, along with the most common challenges and solutions. Finally, we summarize the bioinformatics of advanced viral, bacterial, and parasite pathogen characterization, including types of study questions that can be answered when utilizing NGS and bioinformatics.
Collapse
Affiliation(s)
- Irina Maljkovic Berry
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, Maryland
| | | | - Kimberly A Bishop-Lilly
- Genomics and Bioinformatics Department, Biological Defense Research Directorate, Naval Medical Research Center-Frederick, Fort Detrick, Maryland
| | - Wiriya Rutvisuttinunt
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, Maryland
| | - Simon Pollett
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, Maryland.,Department of Preventive Medicine and Biostatistics, Uniformed Services University of the Health Sciences, Bethesda, Maryland
| | - Eldin Talundzic
- Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Lindsay Morton
- Global Emerging Infections Surveillance, Armed Forces Health Surveillance Branch, Silver Spring, Maryland
| | - Richard G Jarman
- Viral Diseases Branch, Walter Reed Army Institute of Research, Silver Spring, Maryland
| |
Collapse
|
24
|
Valesano AL, Rumfelt KE, Dimcheff DE, Blair CN, Fitzsimmons WJ, Petrie JG, Martin ET, Lauring AS. Temporal dynamics of SARS-CoV-2 mutation accumulation within and across infected hosts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.01.19.427330. [PMID: 33501443 PMCID: PMC7836113 DOI: 10.1101/2021.01.19.427330] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Analysis of SARS-CoV-2 genetic diversity within infected hosts can provide insight into the generation and spread of new viral variants and may enable high resolution inference of transmission chains. However, little is known about temporal aspects of SARS-CoV-2 intrahost diversity and the extent to which shared diversity reflects convergent evolution as opposed to transmission linkage. Here we use high depth of coverage sequencing to identify within-host genetic variants in 325 specimens from hospitalized COVID-19 patients and infected employees at a single medical center. We validated our variant calling by sequencing defined RNA mixtures and identified a viral load threshold that minimizes false positives. By leveraging clinical metadata, we found that intrahost diversity is low and does not vary by time from symptom onset. This suggests that variants will only rarely rise to appreciable frequency prior to transmission. Although there was generally little shared variation across the sequenced cohort, we identified intrahost variants shared across individuals who were unlikely to be related by transmission. These variants did not precede a rise in frequency in global consensus genomes, suggesting that intrahost variants may have limited utility for predicting future lineages. These results provide important context for sequence-based inference in SARS-CoV-2 evolution and epidemiology.
Collapse
Affiliation(s)
- Andrew L. Valesano
- Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Kalee E. Rumfelt
- Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Derek E. Dimcheff
- Division of Hospital Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
| | - Christopher N. Blair
- Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - William J. Fitzsimmons
- Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| | - Joshua G. Petrie
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Emily T. Martin
- Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA
| | - Adam S. Lauring
- Division of Infectious Diseases, Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
25
|
Knyazev S, Hughes L, Skums P, Zelikovsky A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief Bioinform 2021; 22:96-108. [PMID: 32568371 PMCID: PMC8485218 DOI: 10.1093/bib/bbaa101] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 04/24/2020] [Accepted: 05/04/2020] [Indexed: 01/04/2023] Open
Abstract
The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.
Collapse
|
26
|
Basodi S, Baykal PI, Zelikovsky A, Skums P, Pan Y. Analysis of heterogeneous genomic samples using image normalization and machine learning. BMC Genomics 2020; 21:405. [PMID: 33349236 PMCID: PMC7751093 DOI: 10.1186/s12864-020-6661-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of heterogeneous populations such as viral quasispecies is one of the most challenging bioinformatics problems. Although machine learning models are becoming to be widely employed for analysis of sequence data from such populations, their straightforward application is impeded by multiple challenges associated with technological limitations and biases, difficulty of selection of relevant features and need to compare genomic datasets of different sizes and structures. RESULTS We propose a novel preprocessing approach to transform irregular genomic data into normalized image data. Such representation allows to restate the problems of classification and comparison of heterogeneous populations as image classification problems which can be solved using variety of available machine learning tools. We then apply the proposed approach to two important problems in molecular epidemiology: inference of viral infection stage and detection of viral transmission clusters using next-generation sequencing data. The infection staging method has been applied to HCV HVR1 samples collected from 108 recently and 257 chronically infected individuals. The SVM-based image classification approach achieved more than 95% accuracy for both recently and chronically HCV-infected individuals. Clustering has been performed on the data collected from 33 epidemiologically curated outbreaks, yielding more than 97% accuracy. CONCLUSIONS Sequence image normalization method allows for a robust conversion of genomic data into numerical data and overcomes several issues associated with employing machine learning methods to viral populations. Image data also help in the visualization of genomic data. Experimental results demonstrate that the proposed method can be successfully applied to different problems in molecular epidemiology and surveillance of viral diseases. Simple binary classifiers and clustering techniques applied to the image data are equally or more accurate than other models.
Collapse
Affiliation(s)
- Sunitha Basodi
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA.
| | - Pelin Icer Baykal
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA.,The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 11991, Russia
| | - Pavel Skums
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| | - Yi Pan
- Department of Computer Science, Georgia State University, 25 Park Place NE, Atlanta, GA, 30303, USA
| |
Collapse
|
27
|
García-Crespo C, Soria ME, Gallego I, de Ávila AI, Martínez-González B, Vázquez-Sirvent L, Gómez J, Briones C, Gregori J, Quer J, Perales C, Domingo E. Dissimilar Conservation Pattern in Hepatitis C Virus Mutant Spectra, Consensus Sequences, and Data Banks. J Clin Med 2020; 9:jcm9113450. [PMID: 33121037 PMCID: PMC7692060 DOI: 10.3390/jcm9113450] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 10/15/2020] [Accepted: 10/20/2020] [Indexed: 02/07/2023] Open
Abstract
The influence of quasispecies dynamics on long-term virus diversification in nature is a largely unexplored question. Specifically, whether intra-host nucleotide and amino acid variation in quasispecies fit the variation observed in consensus sequences or data bank alignments is unknown. Genome conservation and dynamics simulations are used for the computational design of universal vaccines, therapeutic antibodies and pan-genomic antiviral agents. The expectation is that selection of escape mutants will be limited when mutations at conserved residues are required. This strategy assumes long-term (epidemiologically relevant) conservation but, critically, does not consider short-term (quasispecies-dictated) residue conservation. We calculated mutant frequencies of individual loci from mutant spectra of hepatitis C virus (HCV) populations passaged in cell culture and from infected patients. Nucleotide or amino acid conservation in consensus sequences of the same populations, or in the Los Alamos HCV data bank did not match residue conservation in mutant spectra. The results relativize the concept of sequence conservation in viral genetics and suggest that residue invariance in data banks is an insufficient basis for the design of universal viral ligands for clinical purposes. Our calculations suggest relaxed mutational restrictions during quasispecies dynamics, which may contribute to higher calculated short-term than long-term viral evolutionary rates.
Collapse
Affiliation(s)
- Carlos García-Crespo
- Department of Interactions with the environment, Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain; (C.G.-C.); (M.E.S.); (I.G.); (A.I.d.Á.); (B.M.-G.); (L.V.-S.)
| | - María Eugenia Soria
- Department of Interactions with the environment, Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain; (C.G.-C.); (M.E.S.); (I.G.); (A.I.d.Á.); (B.M.-G.); (L.V.-S.)
- Department of Clinical Microbiology, IIS-Fundación Jiménez Díaz, UAM. Av. Reyes Católicos 2, 28040 Madrid, Spain
| | - Isabel Gallego
- Department of Interactions with the environment, Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain; (C.G.-C.); (M.E.S.); (I.G.); (A.I.d.Á.); (B.M.-G.); (L.V.-S.)
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain; (J.G.); (C.B.); (J.G.); (J.Q.)
| | - Ana Isabel de Ávila
- Department of Interactions with the environment, Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain; (C.G.-C.); (M.E.S.); (I.G.); (A.I.d.Á.); (B.M.-G.); (L.V.-S.)
| | - Brenda Martínez-González
- Department of Interactions with the environment, Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain; (C.G.-C.); (M.E.S.); (I.G.); (A.I.d.Á.); (B.M.-G.); (L.V.-S.)
- Department of Clinical Microbiology, IIS-Fundación Jiménez Díaz, UAM. Av. Reyes Católicos 2, 28040 Madrid, Spain
| | - Lucía Vázquez-Sirvent
- Department of Interactions with the environment, Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain; (C.G.-C.); (M.E.S.); (I.G.); (A.I.d.Á.); (B.M.-G.); (L.V.-S.)
| | - Jordi Gómez
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain; (J.G.); (C.B.); (J.G.); (J.Q.)
- Department of Molecular Biology, Instituto de Parasitología y Biomedicina ‘López-Neyra’ (CSIC), Parque Tecnológico Ciencias de la Salud, Armilla, 18016 Granada, Spain
| | - Carlos Briones
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain; (J.G.); (C.B.); (J.G.); (J.Q.)
- Department of Molecular Evolution, Centro de Astrobiología (CAB, CSIC-INTA), Torrejón de Ardoz, 28850 Madrid, Spain
| | - Josep Gregori
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain; (J.G.); (C.B.); (J.G.); (J.Q.)
- Liver Unit, Liver Diseases—Viral Hepatitis, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Roche Diagnostics, S.L., Sant Cugat del Vallés, 08174 Barcelona, Spain
| | - Josep Quer
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain; (J.G.); (C.B.); (J.G.); (J.Q.)
- Liver Unit, Liver Diseases—Viral Hepatitis, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Celia Perales
- Department of Interactions with the environment, Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain; (C.G.-C.); (M.E.S.); (I.G.); (A.I.d.Á.); (B.M.-G.); (L.V.-S.)
- Department of Clinical Microbiology, IIS-Fundación Jiménez Díaz, UAM. Av. Reyes Católicos 2, 28040 Madrid, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain; (J.G.); (C.B.); (J.G.); (J.Q.)
- Correspondence: or (C.P.); (E.D.)
| | - Esteban Domingo
- Department of Interactions with the environment, Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain; (C.G.-C.); (M.E.S.); (I.G.); (A.I.d.Á.); (B.M.-G.); (L.V.-S.)
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, 28029 Madrid, Spain; (J.G.); (C.B.); (J.G.); (J.Q.)
- Correspondence: or (C.P.); (E.D.)
| |
Collapse
|
28
|
Alamil M, Hughes J, Berthier K, Desbiez C, Thébaud G, Soubeyrand S. Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases. Philos Trans R Soc Lond B Biol Sci 2020; 374:20180258. [PMID: 31056055 PMCID: PMC6553606 DOI: 10.1098/rstb.2018.0258] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Pathogen sequence data have been exploited to infer who infected whom, by using empirical and model-based approaches. Most of these approaches exploit one pathogen sequence per infected host (e.g. individual, household, field). However, modern sequencing techniques can reveal the polymorphic nature of within-host populations of pathogens. Thus, these techniques provide a subsample of the pathogen variants that were present in the host at the sampling time. Such data are expected to give more insight on epidemiological links than a single sequence per host. In general, a mechanistic viewpoint to transmission and micro-evolution has been followed to infer epidemiological links from these data. Here, we investigate an alternative approach grounded on statistical learning. The idea consists of learning the structure of epidemiological links with a pseudo-evolutionary model applied to training data obtained from contact tracing, for example, and using this initial stage to infer links for the whole dataset. Such an approach has the potential to be particularly valuable in the case of a risk of erroneous mechanistic assumptions, it is sufficiently parsimonious to allow the handling of big datasets in the future, and it is versatile enough to be applied to very different contexts from animal, human and plant epidemiology. This article is part of the theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes’. This issue is linked with the subsequent theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control’.
Collapse
Affiliation(s)
- M Alamil
- 1 BioSP, INRA, 84914 Avignon , France
| | - J Hughes
- 2 MRC-University of Glasgow Centre for Virus Research , Glasgow G61 1QH , UK
| | - K Berthier
- 3 Pathologie Végétale, INRA , 84140 Montfavet , France
| | - C Desbiez
- 3 Pathologie Végétale, INRA , 84140 Montfavet , France
| | - G Thébaud
- 4 BGPI, INRA, Univ. Montpellier , SupAgro, Cirad, 34398 Montpellier , France
| | | |
Collapse
|
29
|
Abstract
PURPOSE OF REVIEW Within-host diversity complicates transmission models because it recognizes that between-host virus phylogenies are not identical to the transmission history among the infected hosts. This review presents the biological and theoretical foundations for recent development in this field, and shows that modern phylodynamic methods are capable of inferring realistic transmission histories from HIV sequence data. RECENT FINDINGS Transmission of single or multiple genetic variants from a donor's HIV population results in donor-recipient phylogenies with combinations of monophyletic, paraphyletic, and polyphyletic patterns. Large-scale simulations and analyses of many real HIV datasets have established that transmission direction, directness, or common source often can be inferred based on HIV sequence data. Phylodynamic reconstruction of HIV transmissions that include within-host HIV diversity have recently been established and made available in several software packages. SUMMARY Phylodynamic methods that include realistic features of HIV genetic diversification have come of age, significantly improving inference of key epidemiological parameters. This opens the door to more accurate surveillance and better-informed prevention campaigns.
Collapse
|
30
|
Phylogenetic and Demographic Characterization of Directed HIV-1 Transmission Using Deep Sequences from High-Risk and General Population Cohorts/Groups in Uganda. Viruses 2020; 12:v12030331. [PMID: 32197553 PMCID: PMC7150763 DOI: 10.3390/v12030331] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 03/13/2020] [Accepted: 03/16/2020] [Indexed: 12/12/2022] Open
Abstract
Across sub-Saharan Africa, key populations with elevated HIV-1 incidence and/or prevalence have been identified, but their contribution to disease spread remains unclear. We performed viral deep-sequence phylogenetic analyses to quantify transmission dynamics between the general population (GP), fisherfolk communities (FF), and women at high risk of infection and their clients (WHR) in central and southwestern Uganda. Between August 2014 and August 2017, 6185 HIV-1 positive individuals were enrolled in 3 GP and 10 FF communities, 3 WHR enrollment sites. A total of 2531 antiretroviral therapy (ART) naïve participants with plasma viral load >1000 copies/mL were deep-sequenced. One hundred and twenty-three transmission networks were reconstructed, including 105 phylogenetically highly supported source–recipient pairs. Only one pair involved a WHR and male participant, suggesting that improved population sampling is needed to assess empirically the role of WHR to the transmission dynamics. More transmissions were observed from the GP communities to FF communities than vice versa, with an estimated flow ratio of 1.56 (95% CrI 0.68–3.72), indicating that fishing communities on Lake Victoria are not a net source of transmission flow to neighboring communities further inland. Men contributed disproportionally to HIV-1 transmission flow regardless of age, suggesting that prevention efforts need to better aid men to engage with and stay in care.
Collapse
|
31
|
de Bernardi Schneider A, Ford CT, Hostager R, Williams J, Cioce M, Çatalyürek ÜV, Wertheim JO, Janies D. StrainHub: a phylogenetic tool to construct pathogen transmission networks. Bioinformatics 2020; 36:945-947. [PMID: 31418766 PMCID: PMC8215912 DOI: 10.1093/bioinformatics/btz646] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 08/06/2019] [Accepted: 08/14/2019] [Indexed: 01/30/2023] Open
Abstract
SUMMARY In exploring the epidemiology of infectious diseases, networks have been used to reconstruct contacts among individuals and/or populations. Summarizing networks using pathogen metadata (e.g. host species and place of isolation) and a phylogenetic tree is a nascent, alternative approach. In this paper, we introduce a tool for reconstructing transmission networks in arbitrary space from phylogenetic information and metadata. Our goals are to provide a means of deriving new insights and infection control strategies based on the dynamics of the pathogen lineages derived from networks and centrality metrics. We created a web-based application, called StrainHub, in which a user can input a phylogenetic tree based on genetic or other data along with characters derived from metadata using their preferred tree search method. StrainHub generates a transmission network based on character state changes in metadata, such as place or source of isolation, mapped on the phylogenetic tree. The user has the option to calculate centrality metrics on the nodes including betweenness, closeness, degree and a new metric, the source/hub ratio. The outputs include the network with values for metrics on its nodes and the tree with characters reconstructed. All of these results can be exported for further analysis. AVAILABILITY AND IMPLEMENTATION strainhub.io and https://github.com/abschneider/StrainHub.
Collapse
Affiliation(s)
| | - Colby T Ford
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Reilly Hostager
- Department of Medicine, University of California, San Diego, San Diego, CA 92103, USA
| | - John Williams
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Michael Cioce
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Ümit V Çatalyürek
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Joel O Wertheim
- Department of Medicine, University of California, San Diego, San Diego, CA 92103, USA
| | - Daniel Janies
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
32
|
Pérez-Losada M, Arenas M, Galán JC, Bracho MA, Hillung J, García-González N, González-Candelas F. High-throughput sequencing (HTS) for the analysis of viral populations. INFECTION GENETICS AND EVOLUTION 2020; 80:104208. [PMID: 32001386 DOI: 10.1016/j.meegid.2020.104208] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/21/2020] [Accepted: 01/24/2020] [Indexed: 12/12/2022]
Abstract
The development of High-Throughput Sequencing (HTS) technologies is having a major impact on the genomic analysis of viral populations. Current HTS platforms can capture nucleic acid variation across millions of genes for both selected amplicons and full viral genomes. HTS has already facilitated the discovery of new viruses, hinted new taxonomic classifications and provided a deeper and broader understanding of their diversity, population and genetic structure. Hence, HTS has already replaced standard Sanger sequencing in basic and applied research fields, but the next step is its implementation as a routine technology for the analysis of viruses in clinical settings. The most likely application of this implementation will be the analysis of viral genomics, because the huge population sizes, high mutation rates and very fast replacement of viral populations have demonstrated the limited information obtained with Sanger technology. In this review, we describe new technologies and provide guidelines for the high-throughput sequencing and genetic and evolutionary analyses of viral populations and metaviromes, including software applications. With the development of new HTS technologies, new and refurbished molecular and bioinformatic tools are also constantly being developed to process and integrate HTS data. These allow assembling viral genomes and inferring viral population diversity and dynamics. Finally, we also present several applications of these approaches to the analysis of viral clinical samples including transmission clusters and outbreak characterization.
Collapse
Affiliation(s)
- Marcos Pérez-Losada
- Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão 4485-661, Portugal
| | - Miguel Arenas
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain; Biomedical Research Center (CINBIO), University of Vigo, 36310 Vigo, Spain.
| | - Juan Carlos Galán
- Microbiology Service, Hospital Ramón y Cajal, Madrid, Spain; CIBER in Epidemiology and Public Health, Spain.
| | - Mª Alma Bracho
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain.
| | - Julia Hillung
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Neris García-González
- Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| | - Fernando González-Candelas
- CIBER in Epidemiology and Public Health, Spain; Joint Research Unit "Infection and Public Health" FISABIO-University of Valencia, Valencia, Spain; Institute for Integrative Systems Biology (I2SysBio), CSIC-University of Valencia, Valencia, Spain.
| |
Collapse
|
33
|
Tan MP, Wong LL, Razali SA, Afiqah-Aleng N, Mohd Nor SA, Sung YY, Van de Peer Y, Sorgeloos P, Danish-Daniel M. Applications of Next-Generation Sequencing Technologies and Computational Tools in Molecular Evolution and Aquatic Animals Conservation Studies: A Short Review. Evol Bioinform Online 2019; 15:1176934319892284. [PMID: 31839703 PMCID: PMC6896124 DOI: 10.1177/1176934319892284] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 11/12/2019] [Indexed: 12/21/2022] Open
Abstract
Aquatic ecosystems that form major biodiversity hotspots are critically threatened due to environmental and anthropogenic stressors. We believe that, in this genomic era, computational methods can be applied to promote aquatic biodiversity conservation by addressing questions related to the evolutionary history of aquatic organisms at the molecular level. However, huge amounts of genomics data generated can only be discerned through the use of bioinformatics. Here, we examine the applications of next-generation sequencing technologies and bioinformatics tools to study the molecular evolution of aquatic animals and discuss the current challenges and future perspectives of using bioinformatics toward aquatic animal conservation efforts.
Collapse
Affiliation(s)
- Min Pau Tan
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia.,Faculty of Fisheries and Food Science, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
| | - Li Lian Wong
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia.,Institute of Tropical Aquaculture, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
| | - Siti Aisyah Razali
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
| | - Nor Afiqah-Aleng
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
| | - Siti Azizah Mohd Nor
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
| | - Yeong Yik Sung
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
| | - Yves Van de Peer
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia.,Center for Plant Systems Biology, VIB, Ghent, Belgium.,Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Patrick Sorgeloos
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia.,Laboratory of Aquaculture & Artemia Reference Center, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | - Muhd Danish-Daniel
- Institute of Marine Biotechnology, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia.,Faculty of Fisheries and Food Science, Universiti Malaysia Terengganu, Kuala Nerus, Malaysia
| |
Collapse
|
34
|
Abstract
Viral quasispecies refers to a population structure that consists of extremely large numbers of variant genomes, termed mutant spectra, mutant swarms or mutant clouds. Fueled by high mutation rates, mutants arise continually, and they change in relative frequency as viral replication proceeds. The term quasispecies was adopted from a theory of the origin of life in which primitive replicons) consisted of mutant distributions, as found experimentally with present day RNA viruses. The theory provided a new definition of wild type, and a conceptual framework for the interpretation of the adaptive potential of RNA viruses that contrasted with classical studies based on consensus sequences. Standard clonal analyses and deep sequencing methodologies have confirmed the presence of myriads of mutant genomes in viral populations, and their participation in adaptive processes. The quasispecies concept applies to any biological entity, but its impact is more evident when the genome size is limited and the mutation rate is high. This is the case of the RNA viruses, ubiquitous in our biosphere, and that comprise many important pathogens. In virology, quasispecies are defined as complex distributions of closely related variant genomes subjected to genetic variation, competition and selection, and that may act as a unit of selection. Despite being an integral part of their replication, high mutation rates have an upper limit compatible with inheritable information. Crossing such a limit leads to RNA virus extinction, a transition that is the basis of an antiviral design termed lethal mutagenesis.
Collapse
Affiliation(s)
- Esteban Domingo
- Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, Madrid, Spain
| | - Celia Perales
- Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, Madrid, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd) del Instituto de Salud Carlos III, Madrid, Spain
- Department of Clinical Microbiology, IIS-Fundación Jiménez Díaz, UAM, Madrid, Spain
| |
Collapse
|
35
|
Abstract
One approach to the reconstruction of infectious disease transmission trees from pathogen genomic data has been to use a phylogenetic tree, reconstructed from pathogen sequences, and annotate its internal nodes to provide a reconstruction of which host each lineage was in at each point in time. If only one pathogen lineage can be transmitted to a new host (i.e., the transmission bottleneck is complete), this corresponds to partitioning the nodes of the phylogeny into connected regions, each of which represents evolution in an individual host. These partitions define the possible transmission trees that are consistent with a given phylogenetic tree. However, the mathematical properties of the transmission trees given a phylogeny remain largely unexplored. Here, we describe a procedure to calculate the number of possible transmission trees for a given phylogeny, and we then show how to uniformly sample from these transmission trees. The procedure is outlined for situations where one sample is available from each host and trees do not have branch lengths, and we also provide extensions for incomplete sampling, multiple sampling, and the application to time trees in a situation where limits on the period during which each host could have been infected and infectious are known. The sampling algorithm is available as an R package (STraTUS).
Collapse
Affiliation(s)
- Matthew D Hall
- Nuffield Department of Medicine, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
36
|
Ratmann O, Grabowski MK, Hall M, Golubchik T, Wymant C, Abeler-Dörner L, Bonsall D, Hoppe A, Brown AL, de Oliveira T, Gall A, Kellam P, Pillay D, Kagaayi J, Kigozi G, Quinn TC, Wawer MJ, Laeyendecker O, Serwadda D, Gray RH, Fraser C. Inferring HIV-1 transmission networks and sources of epidemic spread in Africa with deep-sequence phylogenetic analysis. Nat Commun 2019; 10:1411. [PMID: 30926780 PMCID: PMC6441045 DOI: 10.1038/s41467-019-09139-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Accepted: 02/22/2019] [Indexed: 11/09/2022] Open
Abstract
To prevent new infections with human immunodeficiency virus type 1 (HIV-1) in sub-Saharan Africa, UNAIDS recommends targeting interventions to populations that are at high risk of acquiring and passing on the virus. Yet it is often unclear who and where these 'source' populations are. Here we demonstrate how viral deep-sequencing can be used to reconstruct HIV-1 transmission networks and to infer the direction of transmission in these networks. We are able to deep-sequence virus from a large population-based sample of infected individuals in Rakai District, Uganda, reconstruct partial transmission networks, and infer the direction of transmission within them at an estimated error rate of 16.3% [8.8-28.3%]. With this error rate, deep-sequence phylogenetics cannot be used against individuals in legal contexts, but is sufficiently low for population-level inferences into the sources of epidemic spread. The technique presents new opportunities for characterizing source populations and for targeting of HIV-1 prevention interventions in Africa.
Collapse
Affiliation(s)
- Oliver Ratmann
- Department of Mathematics, Imperial College London, London, SW72AZ, UK.
- Department of Infectious Disease, Epidemiology School of Public Health, Imperial College London, London, W21PG, UK.
| | - M Kate Grabowski
- Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, 21205-2196, USA
- Rakai Health Sciences Program, Entebbe, P.O.Box 49, Uganda
| | - Matthew Hall
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, Old Road Campus, University of Oxford, Oxford, OX3 7BN, UK
| | - Tanya Golubchik
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, Old Road Campus, University of Oxford, Oxford, OX3 7BN, UK
| | - Chris Wymant
- Department of Infectious Disease, Epidemiology School of Public Health, Imperial College London, London, W21PG, UK
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, Old Road Campus, University of Oxford, Oxford, OX3 7BN, UK
| | - Lucie Abeler-Dörner
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, Old Road Campus, University of Oxford, Oxford, OX3 7BN, UK
| | - David Bonsall
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, Old Road Campus, University of Oxford, Oxford, OX3 7BN, UK
| | - Anne Hoppe
- Division of Infection and Immunity, University College London, London, WC1E 6BT, UK
| | - Andrew Leigh Brown
- School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3FF, UK
| | - Tulio de Oliveira
- College of Health Sciences, University of KwaZulu-Natal, Durban, 4041, South Africa
| | - Astrid Gall
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Paul Kellam
- Department of Medicine, Imperial College London, London, W12 0HS, UK
| | - Deenan Pillay
- Division of Infection and Immunity, University College London, London, WC1E 6BT, UK
- Africa Health Research Institute, Private Bag X7, Durban, 4013, South Africa
| | - Joseph Kagaayi
- Rakai Health Sciences Program, Entebbe, P.O.Box 49, Uganda
| | - Godfrey Kigozi
- Rakai Health Sciences Program, Entebbe, P.O.Box 49, Uganda
| | - Thomas C Quinn
- Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, 21205-2196, USA
- Division of Intramural Research, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD, 20892-9806, USA
| | - Maria J Wawer
- Rakai Health Sciences Program, Entebbe, P.O.Box 49, Uganda
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Oliver Laeyendecker
- Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, 21205-2196, USA
- Division of Intramural Research, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD, 20892-9806, USA
| | - David Serwadda
- Rakai Health Sciences Program, Entebbe, P.O.Box 49, Uganda
- Makerere University School of Public Health, Kampala, 8HQG+3V, Uganda
| | - Ronald H Gray
- Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, 21205-2196, USA
- Rakai Health Sciences Program, Entebbe, P.O.Box 49, Uganda
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Christophe Fraser
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, Old Road Campus, University of Oxford, Oxford, OX3 7BN, UK
| |
Collapse
|
37
|
Tsyvina V, Campo DS, Sims S, Zelikovsky A, Khudyakov Y, Skums P. Fast estimation of genetic relatedness between members of heterogeneous populations of closely related genomic variants. BMC Bioinformatics 2018; 19:360. [PMID: 30343669 PMCID: PMC6196405 DOI: 10.1186/s12859-018-2333-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Background Many biological analysis tasks require extraction of families of genetically similar sequences from large datasets produced by Next-generation Sequencing (NGS). Such tasks include detection of viral transmissions by analysis of all genetically close pairs of sequences from viral datasets sampled from infected individuals or studying of evolution of viruses or immune repertoires by analysis of network of intra-host viral variants or antibody clonotypes formed by genetically close sequences. The most obvious naïeve algorithms to extract such sequence families are impractical in light of the massive size of modern NGS datasets. Results In this paper, we present fast and scalable k-mer-based framework to perform such sequence similarity queries efficiently, which specifically targets data produced by deep sequencing of heterogeneous populations such as viruses. It shows better filtering quality and time performance when comparing to other tools. The tool is freely available for download at https://github.com/vyacheslav-tsivina/signature-sj Conclusion The proposed tool allows for efficient detection of genetic relatedness between genomic samples produced by deep sequencing of heterogeneous populations. It should be especially useful for analysis of relatedness of genomes of viruses with unevenly distributed variable genomic regions, such as HIV and HCV. For the future we envision, that besides applications in molecular epidemiology the tool can also be adapted to immunosequencing and metagenomics data.
Collapse
Affiliation(s)
- Viachaslau Tsyvina
- Computer Science Department, Georgia State University, 25 Park Place NE, Atlanta, 30303, GA, USA.
| | - David S Campo
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, 1600 Cliffton Road, Atlanta, 30333, GA, USA
| | - Seth Sims
- Computer Science Department, Georgia State University, 25 Park Place NE, Atlanta, 30303, GA, USA.,Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, 1600 Cliffton Road, Atlanta, 30333, GA, USA
| | - Alex Zelikovsky
- Computer Science Department, Georgia State University, 25 Park Place NE, Atlanta, 30303, GA, USA
| | - Yury Khudyakov
- Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, 1600 Cliffton Road, Atlanta, 30333, GA, USA
| | - Pavel Skums
- Computer Science Department, Georgia State University, 25 Park Place NE, Atlanta, 30303, GA, USA.,Molecular Epidemiology and Bioinformatics Laboratory, Division of Viral Hepatitis, Centers for Disease Control and Prevention, 1600 Cliffton Road, Atlanta, 30333, GA, USA
| |
Collapse
|
38
|
De Maio N, Worby CJ, Wilson DJ, Stoesser N. Bayesian reconstruction of transmission within outbreaks using genomic variants. PLoS Comput Biol 2018; 14:e1006117. [PMID: 29668677 PMCID: PMC5927459 DOI: 10.1371/journal.pcbi.1006117] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 04/30/2018] [Accepted: 04/03/2018] [Indexed: 01/19/2023] Open
Abstract
Pathogen genome sequencing can reveal details of transmission histories and is a powerful tool in the fight against infectious disease. In particular, within-host pathogen genomic variants identified through heterozygous nucleotide base calls are a potential source of information to identify linked cases and infer direction and time of transmission. However, using such data effectively to model disease transmission presents a number of challenges, including differentiating genuine variants from those observed due to sequencing error, as well as the specification of a realistic model for within-host pathogen population dynamics. Here we propose a new Bayesian approach to transmission inference, BadTrIP (BAyesian epiDemiological TRansmission Inference from Polymorphisms), that explicitly models evolution of pathogen populations in an outbreak, transmission (including transmission bottlenecks), and sequencing error. BadTrIP enables the inference of host-to-host transmission from pathogen sequencing data and epidemiological data. By assuming that genomic variants are unlinked, our method does not require the computationally intensive and unreliable reconstruction of individual haplotypes. Using simulations we show that BadTrIP is robust in most scenarios and can accurately infer transmission events by efficiently combining information from genetic and epidemiological sources; thanks to its realistic model of pathogen evolution and the inclusion of epidemiological data, BadTrIP is also more accurate than existing approaches. BadTrIP is distributed as an open source package (https://bitbucket.org/nicofmay/badtrip) for the phylogenetic software BEAST2. We apply our method to reconstruct transmission history at the early stages of the 2014 Ebola outbreak, showcasing the power of within-host genomic variants to reconstruct transmission events. We present a new tool to reconstruct transmission events within outbreaks. Our approach makes use of pathogen genetic information, notably genetic variants at low frequency within host that are usually discarded, and combines it with epidemiological information of host exposure to infection. This leads to accurate reconstruction of transmission even in cases where abundant within-host pathogen genetic variation and weak transmission bottlenecks (multiple pathogen units colonising a new host at transmission) would otherwise make inference difficult due to the transmission history differing from the pathogen evolution history inferred from pathogen isolets. Also, the use of within-host pathogen genomic variants increases the resolution of the reconstruction of the transmission tree even in scenarios with limited within-outbreak pathogen genetic diversity: within-host pathogen populations that appear identical at the level of consensus sequences can be discriminated using within-host variants. Our Bayesian approach provides a measure of the confidence in different possible transmission histories, and is published as open source software. We show with simulations and with an analysis of the beginning of the 2014 Ebola outbreak that our approach is applicable in many scenarios, improves our understanding of transmission dynamics, and will contribute to finding and limiting sources and routes of transmission, and therefore preventing the spread of infectious disease.
Collapse
Affiliation(s)
- Nicola De Maio
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Colin J Worby
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Daniel J Wilson
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom.,Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Nicole Stoesser
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|