1
|
Xu P, Liang S, Hahn A, Zhao V, Lo WT‘J, Haller BC, Sobkowiak B, Chitwood MH, Colijn C, Cohen T, Rhee KY, Messer PW, Wells MT, Clark AG, Kim J. e3SIM: epidemiological-ecological-evolutionary simulation framework for genomic epidemiology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.29.601123. [PMID: 39005464 PMCID: PMC11244936 DOI: 10.1101/2024.06.29.601123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Infectious disease dynamics are driven by the complex interplay of epidemiological, ecological, and evolutionary processes. Accurately modeling these interactions is crucial for understanding pathogen spread and informing public health strategies. However, existing simulators often fail to capture the dynamic interplay between these processes, resulting in oversimplified models that do not fully reflect real-world complexities in which the pathogen's genetic evolution dynamically influences disease transmission. We introduce the epidemiological-ecological-evolutionary simulator (e3SIM), an open-source framework that concurrently models the transmission dynamics and molecular evolution of pathogens within a host population while integrating environmental factors. Using an agent-based, discrete-generation, forward-in-time approach, e3SIM incorporates compartmental models, host-population contact networks, and quantitative-trait models for pathogens. This integration allows for realistic simulations of disease spread and pathogen evolution. Key features include a modular and scalable design, flexibility in modeling various epidemiological and population-genetic complexities, incorporation of time-varying environmental factors, and a user-friendly graphical interface. We demonstrate e3SIM's capabilities through simulations of realistic outbreak scenarios with SARS-CoV-2 and Mycobacterium tuberculosis, illustrating its flexibility for studying the genomic epidemiology of diverse pathogen types.
Collapse
Affiliation(s)
- Peiyu Xu
- Department of Molecular Biology & Genetics, Cornell University, Ithaca, NY, USA
| | - Shenni Liang
- Department of Computational Science, Cornell University, Ithaca, NY, USA
| | - Andrew Hahn
- Department of Computational Science, Cornell University, Ithaca, NY, USA
| | - Vivian Zhao
- Department of Computational Science, Cornell University, Ithaca, NY, USA
| | - Wai Tung ‘Jack’ Lo
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Benjamin C. Haller
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Benjamin Sobkowiak
- Department of Epidemiology of Microbial Disease, Yale School of Public Health, New Haven, CT, USA
| | - Melanie H. Chitwood
- Department of Epidemiology of Microbial Disease, Yale School of Public Health, New Haven, CT, USA
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Ted Cohen
- Department of Epidemiology of Microbial Disease, Yale School of Public Health, New Haven, CT, USA
| | - Kyu Y. Rhee
- Department of Medicine, Weill Cornell Medicine, New York, NY, USA
| | - Philipp W. Messer
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Martin T. Wells
- Department of Statistics and Data Science, Cornell University, Ithaca, NY, USA
| | - Andrew G. Clark
- Department of Molecular Biology & Genetics, Cornell University, Ithaca, NY, USA
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Jaehee Kim
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| |
Collapse
|
2
|
Sobkowiak B, Haghmaram P, Prystajecky N, Zlosnik JEA, Tyson J, Hoang LMN, Colijn C. The utility of SARS-CoV-2 genomic data for informative clustering under different epidemiological scenarios and sampling. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2023; 113:105484. [PMID: 37531976 DOI: 10.1016/j.meegid.2023.105484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/25/2023] [Accepted: 07/30/2023] [Indexed: 08/04/2023]
Abstract
OBJECTIVES Clustering pathogen sequence data is a common practice in epidemiology to gain insights into the genetic diversity and evolutionary relationships among pathogens. We can find groups of cases with a shared transmission history and common origin, as well as identifying transmission hotspots. Motivated by the experience of clustering SARS-CoV-2 cases using whole genome sequence data during the COVID-19 pandemic to aid with public health investigation, we investigated how differences in epidemiology and sampling can influence the composition of clusters that are identified. METHODS We performed genomic clustering on simulated SARS-CoV-2 outbreaks produced with different transmission rates and levels of genomic diversity, along with varying the proportion of cases sampled. RESULTS In single outbreaks with a low transmission rate, decreasing the sampling fraction resulted in multiple, separate clusters being identified where intermediate cases in transmission chains are missed. Outbreaks simulated with a high transmission rate were more robust to changes in the sampling fraction and largely resulted in a single cluster that included all sampled outbreak cases. When considering multiple outbreaks in a sampled jurisdiction seeded by different introductions, low genomic diversity between introduced cases caused outbreaks to be merged into large clusters. If the transmission and sampling fraction, and diversity between introductions was low, a combination of the spurious break-up of outbreaks and the linking of closely related cases in different outbreaks resulted in clusters that may appear informative, but these did not reflect the true underlying population structure. Conversely, genomic clusters matched the true population structure when there was relatively high diversity between introductions and a high transmission rate. CONCLUSION Differences in epidemiology and sampling can impact our ability to identify genomic clusters that describe the underlying population structure. These findings can help to guide recommendations for the use of pathogen clustering in public health investigations.
Collapse
Affiliation(s)
| | - Pouya Haghmaram
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| | - Natalie Prystajecky
- BC Centre for Disease Control Public Health Laboratory, BC Centre for Disease Control, Vancouver, Canada; Department of Pathology and Laboratory Medicine, Faculty of Medicine, University of British Columbia, Canada
| | - James E A Zlosnik
- BC Centre for Disease Control Public Health Laboratory, BC Centre for Disease Control, Vancouver, Canada
| | - John Tyson
- BC Centre for Disease Control Public Health Laboratory, BC Centre for Disease Control, Vancouver, Canada
| | - Linda M N Hoang
- BC Centre for Disease Control Public Health Laboratory, BC Centre for Disease Control, Vancouver, Canada; Department of Pathology and Laboratory Medicine, Faculty of Medicine, University of British Columbia, Canada
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
3
|
Alamil M, Thébaud G, Berthier K, Soubeyrand S. Characterizing viral within-host diversity in fast and non-equilibrium demo-genetic dynamics. Front Microbiol 2022; 13:983938. [PMID: 36274731 PMCID: PMC9581327 DOI: 10.3389/fmicb.2022.983938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 09/08/2022] [Indexed: 11/13/2022] Open
Abstract
High-throughput sequencing has opened the route for a deep assessment of within-host genetic diversity that can be used, e.g., to characterize microbial communities and to infer transmission links in infectious disease outbreaks. The performance of such characterizations and inferences cannot be analytically assessed in general and are often grounded on computer-intensive evaluations. Then, being able to simulate within-host genetic diversity across time under various demo-genetic assumptions is paramount to assess the performance of the approaches of interest. In this context, we built an original model that can be simulated to investigate the temporal evolution of genotypes and their frequencies under various demo-genetic assumptions. The model describes the growth and the mutation of genotypes at the nucleotide resolution conditional on an overall within-host viral kinetics, and can be tuned to generate fast non-equilibrium demo-genetic dynamics. We ran simulations of this model and computed classic diversity indices to characterize the temporal variation of within-host genetic diversity (from high-throughput amplicon sequences) of virus populations under three demographic kinetic models of viral infection. Our results highlight how demographic (viral load) and genetic (mutation, selection, or drift) factors drive variations in within-host diversity during the course of an infection. In particular, we observed a non-monotonic relationship between pathogen population size and genetic diversity, and a reduction of the impact of mutation on diversity when a non-specific host immune response is activated. The large variation in the diversity patterns generated in our simulations suggests that the underlying model provides a flexible basis to produce very diverse demo-genetic scenarios and test, for instance, methods for the inference of transmission links during outbreaks.
Collapse
Affiliation(s)
- Maryam Alamil
- INRAE, BioSP, Avignon, France
- Department of Mathematics and Computer Science, Alfaisal University, Riyadh, Saudi Arabia
- *Correspondence: Maryam Alamil ;
| | - Gaël Thébaud
- PHIM Plant Health Institute, INRAE, Univ Montpellier, CIRAD, Institut Agro, IRD, Montpellier, France
| | | | | |
Collapse
|
4
|
Cárdenas P, Corredor V, Santos-Vega M. Genomic epidemiological models describe pathogen evolution across fitness valleys. SCIENCE ADVANCES 2022; 8:eabo0173. [PMID: 35857510 PMCID: PMC9278859 DOI: 10.1126/sciadv.abo0173] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 05/26/2022] [Indexed: 06/15/2023]
Abstract
Genomics is fundamentally changing epidemiological research. However, systematically exploring hypotheses in pathogen evolution requires new modeling tools. Models intertwining pathogen epidemiology and genomic evolution can help understand processes such as the emergence of novel pathogen genotypes with higher transmissibility or resistance to treatment. In this work, we present Opqua, a flexible simulation framework that explicitly links epidemiology to sequence evolution and selection. We use Opqua to study determinants of evolution across fitness valleys. We confirm that competition can limit evolution in high-transmission environments and find that low transmission, host mobility, and complex pathogen life cycles facilitate reaching new adaptive peaks through population bottlenecks and decoupling of selective pressures. The results show the potential of genomic epidemiological modeling as a tool in infectious disease research.
Collapse
Affiliation(s)
- Pablo Cárdenas
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Vladimir Corredor
- Departamento de Salud Pública, Facultad de Medicina, Universidad Nacional de Colombia, Bogotá, D.C., Colombia
| | - Mauricio Santos-Vega
- Grupo Biología Matemática y Computacional, Departamento Ingeniería Biomédica, Universidad de los Andes, Bogotá, D.C., Colombia
| |
Collapse
|
5
|
Kahn R, Wang R, Leavitt SV, Hanage WP, Lipsitch M. Leveraging Pathogen Sequence and Contact Tracing Data to Enhance Vaccine Trials in Emerging Epidemics. Epidemiology 2021; 32:698-704. [PMID: 34039898 PMCID: PMC8338748 DOI: 10.1097/ede.0000000000001367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
INTRODUCTION Advance planning of vaccine trials conducted during outbreaks increases our ability to rapidly define the efficacy and potential impact of a vaccine. Vaccine efficacy against infectiousness (VEI) is an important measure for understanding a vaccine's full impact, yet it is currently not identifiable in many trial designs because it requires knowledge of infectors' vaccination status. Recent advances in genomics have improved our ability to reconstruct transmission networks. We aim to assess if augmenting trials with pathogen sequence and contact tracing data can permit them to estimate VEI. METHODS We develop a transmission model with a vaccine trial in an outbreak setting, incorporate pathogen sequence data and contact tracing data, and assign probabilities to likely infectors. We then propose and evaluate the performance of an estimator of VEI. RESULTS We find that under perfect knowledge of infector-infectee pairs, we are able to accurately estimate VEI. Use of sequence data results in imperfect reconstruction of transmission networks, biasing estimates of VEI towards the null, with approaches using deep sequence data performing better than approaches using consensus sequence data. Inclusion of contact tracing data reduces the bias. CONCLUSION Pathogen genomics enhance identifiability of VEI, but imperfect transmission network reconstruction biases estimate toward the null and limits our ability to detect VEI. Given the consistent direction of the bias, estimates obtained from trials using these methods will provide lower bounds on the true VEI. A combination of sequence and epidemiologic data results in the most accurate estimates, underscoring the importance of contact tracing.
Collapse
Affiliation(s)
- Rebecca Kahn
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, USA
| | - Rui Wang
- Department of Population Medicine, Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, Massachusetts, USA
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, USA
| | - Sarah V. Leavitt
- Department of Biostatistics, School of Public Health, Boston University, Boston, Massachusetts, USA
| | - William P. Hanage
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, USA
| | - Marc Lipsitch
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, Massachusetts, USA
- Department of Immunology and Infectious Diseases, Harvard TH Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
6
|
Dawson D, Rasmussen D, Peng X, Lanzas C. Inferring environmental transmission using phylodynamics: a case-study using simulated evolution of an enteric pathogen. J R Soc Interface 2021; 18:20210041. [PMID: 34102084 DOI: 10.1098/rsif.2021.0041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Indirect (environmental) and direct (host-host) transmission pathways cannot easily be distinguished when they co-occur in epidemics, particularly when they occur on similar time scales. Phylodynamic reconstruction is a potential approach to this problem that combines epidemiological information (temporal, spatial information) with pathogen whole-genome sequencing data to infer transmission trees of epidemics. However, factors such as differences in mutation and transmission rates between host and non-host environments may obscure phylogenetic inference from these methods. In this study, we used a network-based transmission model that explicitly models pathogen evolution to simulate epidemics with both direct and indirect transmission. Epidemics were simulated according to factorial combinations of direct/indirect transmission proportions, host mutation rates and conditions of environmental pathogen growth. Transmission trees were then reconstructed using the phylodynamic approach SCOTTI (structured coalescent transmission tree inference) and evaluated. We found that although insufficient diversity sets a lower bound on when accurate phylodynamic inferences can be made, transmission routes and assumed pathogen lifestyle affected pathogen population structure and subsequently influenced both reconstruction success and the likelihood of direct versus indirect pathways being reconstructed. We conclude that prior knowledge of the likely ecology and population structure of pathogens in host and non-host environments is critical to fully using phylodynamic techniques.
Collapse
Affiliation(s)
- Daniel Dawson
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, USA
| | - David Rasmussen
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA.,Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC, USA
| | - Xinxia Peng
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA.,Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, USA
| | - Cristina Lanzas
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, USA
| |
Collapse
|
7
|
Abstract
Within-host adaptation is a hallmark of chronic bacterial infections, involving substantial genomic changes. Recent large-scale genomic data from prolonged infections allow the examination of adaptive strategies employed by different pathogens and open the door to investigate whether they converge toward similar strategies. Here, we compiled extensive data of whole-genome sequences of bacterial isolates belonging to miscellaneous species sampled at sequential time points during clinical infections. Analysis of these data revealed that different species share some common adaptive strategies, achieved by mutating various genes. Although the same genes were often mutated in several strains within a species, different genes related to the same pathway, structure, or function were changed in other species utilizing the same adaptive strategy (e.g., mutating flagellar genes). Strategies exploited by various bacterial species were often predicted to be driven by the host immune system, a powerful selective pressure that is not species specific. Remarkably, we find adaptive strategies identified previously within single species to be ubiquitous. Two striking examples are shifts from siderophore-based to heme-based iron scavenging (previously shown for Pseudomonas aeruginosa) and changes in glycerol-phosphate metabolism (previously shown to decrease sensitivity to antibiotics in Mycobacterium tuberculosis). Virulence factors were often adaptively affected in different species, indicating shifts from acute to chronic virulence and virulence attenuation during infection. Our study presents a global view on common within-host adaptive strategies employed by different bacterial species and provides a rich resource for further studying these processes.
Collapse
Affiliation(s)
- Yair E Gatt
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Hanah Margalit
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
8
|
Lequime S, Bastide P, Dellicour S, Lemey P, Baele G. nosoi: A stochastic agent-based transmission chain simulation framework in r. Methods Ecol Evol 2020; 11:1002-1007. [PMID: 32983401 PMCID: PMC7496779 DOI: 10.1111/2041-210x.13422] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 05/13/2020] [Indexed: 12/22/2022]
Abstract
The transmission process of an infectious agent creates a connected chain of hosts linked by transmission events, known as a transmission chain. Reconstructing transmission chains remains a challenging endeavour, except in rare cases characterized by intense surveillance and epidemiological inquiry. Inference frameworks attempt to estimate or approximate these transmission chains but the accuracy and validity of such methods generally lack formal assessment on datasets for which the actual transmission chain was observed.We here introduce nosoi, an open-source r package that offers a complete, tunable and expandable agent-based framework to simulate transmission chains under a wide range of epidemiological scenarios for single-host and dual-host epidemics. nosoi is accessible through GitHub and CRAN, and is accompanied by extensive documentation, providing help and practical examples to assist users in setting up their own simulations.Once infected, each host or agent can undergo a series of events during each time step, such as moving (between locations) or transmitting the infection, all of these being driven by user-specified rules or data, such as travel patterns between locations. nosoi is able to generate a multitude of epidemic scenarios, that can-for example-be used to validate a wide range of reconstruction methods, including epidemic modelling and phylodynamic analyses. nosoi also offers a comprehensive framework to leverage empirically acquired data, allowing the user to explore how variations in parameters can affect epidemic potential. Aside from research questions, nosoi can provide lecturers with a complete teaching tool to offer students a hands-on exploration of the dynamics of epidemiological processes and the factors that impact it. Because the package does not rely on mathematical formalism but uses a more intuitive algorithmic approach, even extensive changes of the entire model can be easily and quickly implemented.
Collapse
Affiliation(s)
- Sebastian Lequime
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
- Cluster of Microbial EcologyGroningen Institute for Evolutionary Life SciencesUniversity of GroningenGroningenThe Netherlands
| | - Paul Bastide
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
- IMAGCNRSUniversity of MontpellierMontpellierFrance
| | - Simon Dellicour
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
- Spatial Epidemiology Lab (SpELL)Université Libre de BruxellesBrusselsBelgium
| | - Philippe Lemey
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
| | - Guy Baele
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
| |
Collapse
|
9
|
Moshiri N, Ragonnet-Cronin M, Wertheim JO, Mirarab S. FAVITES: simultaneous simulation of transmission networks, phylogenetic trees and sequences. Bioinformatics 2020; 35:1852-1861. [PMID: 30395173 DOI: 10.1093/bioinformatics/bty921] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 10/29/2018] [Accepted: 11/01/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The ability to simulate epidemics as a function of model parameters allows insights that are unobtainable from real datasets. Further, reconstructing transmission networks for fast-evolving viruses like Human Immunodeficiency Virus (HIV) may have the potential to greatly enhance epidemic intervention, but transmission network reconstruction methods have been inadequately studied, largely because it is difficult to obtain 'truth' sets on which to test them and properly measure their performance. RESULTS We introduce FrAmework for VIral Transmission and Evolution Simulation (FAVITES), a robust framework for simulating realistic datasets for epidemics that are caused by fast-evolving pathogens like HIV. FAVITES creates a generative model to produce contact networks, transmission networks, phylogenetic trees and sequence datasets, and to add error to the data. FAVITES is designed to be extensible by dividing the generative model into modules, each of which is expressed as a fixed API that can be implemented using various models. We use FAVITES to simulate HIV datasets and study the realism of the simulated datasets. We then use the simulated data to study the impact of the increased treatment efforts on epidemiological outcomes. We also study two transmission network reconstruction methods and their effectiveness in detecting fast-growing clusters. AVAILABILITY AND IMPLEMENTATION FAVITES is available at https://github.com/niemasd/FAVITES, and a Docker image can be found on DockerHub (https://hub.docker.com/r/niemasd/favites). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Niema Moshiri
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, La Jolla, USA
| | | | | | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, La Jolla, USA
| |
Collapse
|
10
|
Alamil M, Hughes J, Berthier K, Desbiez C, Thébaud G, Soubeyrand S. Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases. Philos Trans R Soc Lond B Biol Sci 2020; 374:20180258. [PMID: 31056055 PMCID: PMC6553606 DOI: 10.1098/rstb.2018.0258] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Pathogen sequence data have been exploited to infer who infected whom, by using empirical and model-based approaches. Most of these approaches exploit one pathogen sequence per infected host (e.g. individual, household, field). However, modern sequencing techniques can reveal the polymorphic nature of within-host populations of pathogens. Thus, these techniques provide a subsample of the pathogen variants that were present in the host at the sampling time. Such data are expected to give more insight on epidemiological links than a single sequence per host. In general, a mechanistic viewpoint to transmission and micro-evolution has been followed to infer epidemiological links from these data. Here, we investigate an alternative approach grounded on statistical learning. The idea consists of learning the structure of epidemiological links with a pseudo-evolutionary model applied to training data obtained from contact tracing, for example, and using this initial stage to infer links for the whole dataset. Such an approach has the potential to be particularly valuable in the case of a risk of erroneous mechanistic assumptions, it is sufficiently parsimonious to allow the handling of big datasets in the future, and it is versatile enough to be applied to very different contexts from animal, human and plant epidemiology. This article is part of the theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes’. This issue is linked with the subsequent theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control’.
Collapse
Affiliation(s)
- M Alamil
- 1 BioSP, INRA, 84914 Avignon , France
| | - J Hughes
- 2 MRC-University of Glasgow Centre for Virus Research , Glasgow G61 1QH , UK
| | - K Berthier
- 3 Pathologie Végétale, INRA , 84140 Montfavet , France
| | - C Desbiez
- 3 Pathologie Végétale, INRA , 84140 Montfavet , France
| | - G Thébaud
- 4 BGPI, INRA, Univ. Montpellier , SupAgro, Cirad, 34398 Montpellier , France
| | | |
Collapse
|
11
|
Miller JK, Chen J, Sundermann A, Marsh JW, Saul MI, Shutt KA, Pacey M, Mustapha MM, Harrison LH, Dubrawski A. Statistical outbreak detection by joining medical records and pathogen similarity. J Biomed Inform 2019; 91:103126. [PMID: 30771483 PMCID: PMC6424617 DOI: 10.1016/j.jbi.2019.103126] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 01/05/2019] [Accepted: 02/06/2019] [Indexed: 01/08/2023]
Abstract
We present a statistical inference model for the detection and characterization of outbreaks of hospital associated infection. The approach combines patient exposures, determined from electronic medical records, and pathogen similarity, determined by whole-genome sequencing, to simultaneously identify probable outbreaks and their root-causes. We show how our model can be used to target isolates for whole-genome sequencing, improving outbreak detection and characterization even without comprehensive sequencing. Additionally, we demonstrate how to learn model parameters from reference data of known outbreaks. We demonstrate model performance using semi-synthetic experiments.
Collapse
Affiliation(s)
- James K Miller
- Auton Lab, Carnegie Mellon University, Pittsburgh, PA, United States.
| | - Jieshi Chen
- Auton Lab, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Alexander Sundermann
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States; Department of Infection Control and Hospital Epidemiology, University of Pittsburgh Medical Center, Pittsburgh, PA, United States
| | - Jane W Marsh
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States
| | - Melissa I Saul
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Kathleen A Shutt
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States
| | - Marissa Pacey
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States
| | - Mustapha M Mustapha
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States
| | - Lee H Harrison
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States
| | - Artur Dubrawski
- Auton Lab, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
12
|
De Maio N, Worby CJ, Wilson DJ, Stoesser N. Bayesian reconstruction of transmission within outbreaks using genomic variants. PLoS Comput Biol 2018; 14:e1006117. [PMID: 29668677 PMCID: PMC5927459 DOI: 10.1371/journal.pcbi.1006117] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 04/30/2018] [Accepted: 04/03/2018] [Indexed: 01/19/2023] Open
Abstract
Pathogen genome sequencing can reveal details of transmission histories and is a powerful tool in the fight against infectious disease. In particular, within-host pathogen genomic variants identified through heterozygous nucleotide base calls are a potential source of information to identify linked cases and infer direction and time of transmission. However, using such data effectively to model disease transmission presents a number of challenges, including differentiating genuine variants from those observed due to sequencing error, as well as the specification of a realistic model for within-host pathogen population dynamics. Here we propose a new Bayesian approach to transmission inference, BadTrIP (BAyesian epiDemiological TRansmission Inference from Polymorphisms), that explicitly models evolution of pathogen populations in an outbreak, transmission (including transmission bottlenecks), and sequencing error. BadTrIP enables the inference of host-to-host transmission from pathogen sequencing data and epidemiological data. By assuming that genomic variants are unlinked, our method does not require the computationally intensive and unreliable reconstruction of individual haplotypes. Using simulations we show that BadTrIP is robust in most scenarios and can accurately infer transmission events by efficiently combining information from genetic and epidemiological sources; thanks to its realistic model of pathogen evolution and the inclusion of epidemiological data, BadTrIP is also more accurate than existing approaches. BadTrIP is distributed as an open source package (https://bitbucket.org/nicofmay/badtrip) for the phylogenetic software BEAST2. We apply our method to reconstruct transmission history at the early stages of the 2014 Ebola outbreak, showcasing the power of within-host genomic variants to reconstruct transmission events. We present a new tool to reconstruct transmission events within outbreaks. Our approach makes use of pathogen genetic information, notably genetic variants at low frequency within host that are usually discarded, and combines it with epidemiological information of host exposure to infection. This leads to accurate reconstruction of transmission even in cases where abundant within-host pathogen genetic variation and weak transmission bottlenecks (multiple pathogen units colonising a new host at transmission) would otherwise make inference difficult due to the transmission history differing from the pathogen evolution history inferred from pathogen isolets. Also, the use of within-host pathogen genomic variants increases the resolution of the reconstruction of the transmission tree even in scenarios with limited within-outbreak pathogen genetic diversity: within-host pathogen populations that appear identical at the level of consensus sequences can be discriminated using within-host variants. Our Bayesian approach provides a measure of the confidence in different possible transmission histories, and is published as open source software. We show with simulations and with an analysis of the beginning of the 2014 Ebola outbreak that our approach is applicable in many scenarios, improves our understanding of transmission dynamics, and will contribute to finding and limiting sources and routes of transmission, and therefore preventing the spread of infectious disease.
Collapse
Affiliation(s)
- Nicola De Maio
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Colin J Worby
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Daniel J Wilson
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom.,Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Nicole Stoesser
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
13
|
Worby CJ, Lipsitch M, Hanage WP. Shared Genomic Variants: Identification of Transmission Routes Using Pathogen Deep-Sequence Data. Am J Epidemiol 2017; 186:1209-1216. [PMID: 29149252 PMCID: PMC5860558 DOI: 10.1093/aje/kwx182] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2016] [Accepted: 01/18/2017] [Indexed: 12/11/2022] Open
Abstract
Sequencing pathogen samples during a communicable disease outbreak is becoming an increasingly common procedure in epidemiologic investigations. Identifying who infected whom sheds considerable light on transmission patterns, high-risk settings and subpopulations, and the effectiveness of infection control. Genomic data shed new light on transmission dynamics and can be used to identify clusters of individuals likely to be linked by direct transmission. However, identification of individual routes of infection via single genome samples typically remains uncertain. We investigated the potential of deep sequence data to provide greater resolution on transmission routes, via the identification of shared genomic variants. We assessed several easily implemented methods to identify transmission routes using both shared variants and genetic distance, demonstrating that shared variants can provide considerable additional information in most scenarios. While shared-variant approaches identify relatively few links in the presence of a small transmission bottleneck, these links are highly accurate. Furthermore, we propose a hybrid approach that also incorporates phylogenetic distance to provide greater resolution. We applied our methods to data collected during the 2014 Ebola outbreak, identifying several likely routes of transmission. Our study highlights the power of data from deep sequencing of pathogens as a component of outbreak investigation and epidemiologic analyses.
Collapse
Affiliation(s)
- Colin J Worby
- Correspondence to Dr. Colin J. Worby, Department of Ecology and Evolutionary Biology, Princeton University, 106A Guyot Hall, Princeton, NJ 08544 (e-mail: )
| | | | | |
Collapse
|
14
|
Didelot X, Fraser C, Gardy J, Colijn C. Genomic Infectious Disease Epidemiology in Partially Sampled and Ongoing Outbreaks. Mol Biol Evol 2017; 34:997-1007. [PMID: 28100788 PMCID: PMC5850352 DOI: 10.1093/molbev/msw275] [Citation(s) in RCA: 113] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genomic data are increasingly being used to understand infectious disease epidemiology. Isolates from a given outbreak are sequenced, and the patterns of shared variation are used to infer which isolates within the outbreak are most closely related to each other. Unfortunately, the phylogenetic trees typically used to represent this variation are not directly informative about who infected whom-a phylogenetic tree is not a transmission tree. However, a transmission tree can be inferred from a phylogeny while accounting for within-host genetic diversity by coloring the branches of a phylogeny according to which host those branches were in. Here we extend this approach and show that it can be applied to partially sampled and ongoing outbreaks. This requires computing the correct probability of an observed transmission tree and we herein demonstrate how to do this for a large class of epidemiological models. We also demonstrate how the branch coloring approach can incorporate a variable number of unique colors to represent unsampled intermediates in transmission chains. The resulting algorithm is a reversible jump Monte-Carlo Markov Chain, which we apply to both simulated data and real data from an outbreak of tuberculosis. By accounting for unsampled cases and an outbreak which may not have reached its end, our method is uniquely suited to use in a public health environment during real-time outbreak investigations. We implemented this transmission tree inference methodology in an R package called TransPhylo, which is freely available from https://github.com/xavierdidelot/TransPhylo.
Collapse
Affiliation(s)
- Xavier Didelot
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, London, United Kingdom
| | - Christophe Fraser
- Department of Infectious Disease Epidemiology, Imperial College London, Norfolk Place, London, United Kingdom
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Jennifer Gardy
- Communicable Disease Prevention and Control Services, British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
- School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada
| | - Caroline Colijn
- Department of Mathematics, Imperial College, London, United Kingdom
| |
Collapse
|
15
|
Guthrie JL, Gardy JL. A brief primer on genomic epidemiology: lessons learned from Mycobacterium tuberculosis. Ann N Y Acad Sci 2016; 1388:59-77. [PMID: 28009051 DOI: 10.1111/nyas.13273] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Revised: 09/02/2016] [Accepted: 09/13/2016] [Indexed: 12/13/2022]
Abstract
Genomics is now firmly established as a technique for the investigation and reconstruction of communicable disease outbreaks, with many genomic epidemiology studies focusing on revealing transmission routes of Mycobacterium tuberculosis. In this primer, we introduce the basic techniques underlying transmission inference from genomic data, using illustrative examples from M. tuberculosis and other pathogens routinely sequenced by public health agencies. We describe the laboratory and epidemiological scenarios under which genomics may or may not be used, provide an introduction to sequencing technologies and bioinformatics approaches to identifying transmission-informative variation and resistance-associated mutations, and discuss how variation must be considered in the light of available clinical and epidemiological information to infer transmission.
Collapse
Affiliation(s)
- Jennifer L Guthrie
- School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada
| | - Jennifer L Gardy
- School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada.,Communicable Disease Prevention and Control Services, British Columbia Centre for Disease Control, Vancouver, British Columbia, Canada
| |
Collapse
|
16
|
Worby CJ, O'Neill PD, Kypraios T, Robotham JV, De Angelis D, Cartwright EJP, Peacock SJ, Cooper BS. Reconstructing transmission trees for communicable diseases using densely sampled genetic data. Ann Appl Stat 2016; 10:395-417. [PMID: 27042253 DOI: 10.1214/15-aoas898] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Whole genome sequencing of pathogens from multiple hosts in an epidemic offers the potential to investigate who infected whom with unparalleled resolution, potentially yielding important insights into disease dynamics and the impact of control measures. We considered disease outbreaks in a setting with dense genomic sampling, and formulated stochastic epidemic models to investigate person-to-person transmission, based on observed genomic and epidemiological data. We constructed models in which the genetic distance between sampled genotypes depends on the epidemiological relationship between the hosts. A data augmented Markov chain Monte Carlo algorithm was used to sample over the transmission trees, providing a posterior probability for any given transmission route. We investigated the predictive performance of our methodology using simulated data, demonstrating high sensitivity and specificity, particularly for rapidly mutating pathogens with low transmissibility. We then analyzed data collected during an outbreak of methicillin-resistant Staphylococcus aureus in a hospital, identifying probable transmission routes and estimating epidemiological parameters. Our approach overcomes limitations of previous methods, providing a framework with the flexibility to allow for unobserved infection times, multiple independent introductions of the pathogen, and within-host genetic diversity, as well as allowing forward simulation.
Collapse
Affiliation(s)
- Colin J Worby
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK; Center for Communicable Disease Dynamics, Harvard TH Chan School of Public Health, Boston, USA
| | - Philip D O'Neill
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
| | - Theodore Kypraios
- School of Mathematical Sciences, University of Nottingham, Nottingham, UK
| | | | | | - Edward J P Cartwright
- Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
| | - Sharon J Peacock
- Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK; Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Ben S Cooper
- Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK; Mahidol-Oxford Tropical Medicine Research Unit, Bangkok, Thailand
| |
Collapse
|