101
|
Phylodynamic inference for structured epidemiological models. PLoS Comput Biol 2014; 10:e1003570. [PMID: 24743590 PMCID: PMC3990497 DOI: 10.1371/journal.pcbi.1003570] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2013] [Accepted: 02/28/2014] [Indexed: 01/05/2023] Open
Abstract
Coalescent theory is routinely used to estimate past population dynamics and demographic parameters from genealogies. While early work in coalescent theory only considered simple demographic models, advances in theory have allowed for increasingly complex demographic scenarios to be considered. The success of this approach has lead to coalescent-based inference methods being applied to populations with rapidly changing population dynamics, including pathogens like RNA viruses. However, fitting epidemiological models to genealogies via coalescent models remains a challenging task, because pathogen populations often exhibit complex, nonlinear dynamics and are structured by multiple factors. Moreover, it often becomes necessary to consider stochastic variation in population dynamics when fitting such complex models to real data. Using recently developed structured coalescent models that accommodate complex population dynamics and population structure, we develop a statistical framework for fitting stochastic epidemiological models to genealogies. By combining particle filtering methods with Bayesian Markov chain Monte Carlo methods, we are able to fit a wide class of stochastic, nonlinear epidemiological models with different forms of population structure to genealogies. We demonstrate our framework using two structured epidemiological models: a model with disease progression between multiple stages of infection and a two-population model reflecting spatial structure. We apply the multi-stage model to HIV genealogies and show that the proposed method can be used to estimate the stage-specific transmission rates and prevalence of HIV. Finally, using the two-population model we explore how much information about population structure is contained in genealogies and what sample sizes are necessary to reliably infer parameters like migration rates. Mathematical models play an important role in our understanding of what processes drive the complex population dynamics of infectious pathogens. Yet developing statistical methods for fitting models to epidemiological data is difficult. Epidemiological data is often noisy, incomplete, aggregated across different scales and generally provides only a partial picture of the underlying disease dynamics. Using nontraditional sources of data, like molecular sequences of pathogens, can provide additional information about epidemiological dynamics. But current “phylodynamic” inference methods for fitting models to genealogies reconstructed from sequence data have a number of major limitations. We present a statistical framework that builds upon earlier work to address two of these limitations: population structure and stochasticity. By incorporating population structure, our framework can be applied in cases where the host population is divided into different subpopulations, such as by spatial isolation. Our framework also takes into consideration stochastic noise and can therefore capture the inherent variability of epidemiological dynamics. These advances allow for a much wider class of epidemiological models to be fit to genealogies in order to estimate key epidemiological parameters and to reconstruct past disease dynamics.
Collapse
|
102
|
Didelot X, Gardy J, Colijn C. Bayesian inference of infectious disease transmission from whole-genome sequence data. Mol Biol Evol 2014; 31:1869-79. [PMID: 24714079 PMCID: PMC4069612 DOI: 10.1093/molbev/msu121] [Citation(s) in RCA: 138] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Genomics is increasingly being used to investigate disease outbreaks, but an important question remains unanswered—how well do genomic data capture known transmission events, particularly for pathogens with long carriage periods or large within-host population sizes? Here we present a novel Bayesian approach to reconstruct densely sampled outbreaks from genomic data while considering within-host diversity. We infer a time-labeled phylogeny using Bayesian evolutionary analysis by sampling trees (BEAST), and then infer a transmission network via a Monte Carlo Markov chain. We find that under a realistic model of within-host evolution, reconstructions of simulated outbreaks contain substantial uncertainty even when genomic data reflect a high substitution rate. Reconstruction of a real-world tuberculosis outbreak displayed similar uncertainty, although the correct source case and several clusters of epidemiologically linked cases were identified. We conclude that genomics cannot wholly replace traditional epidemiology but that Bayesian reconstructions derived from sequence data may form a useful starting point for a genomic epidemiology investigation.
Collapse
Affiliation(s)
- Xavier Didelot
- Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Jennifer Gardy
- Communicable Disease Prevention and Control Services, British Columbia Centre for Disease Control, Vancouver, BC, CanadaSchool of Population and Public Health, University of British Columbia, Vancouver, BC, Canada
| | - Caroline Colijn
- Department of Mathematics, Imperial College London, United Kingdom
| |
Collapse
|
103
|
Beard R, Magee D, Suchard MA, Lemey P, Scotch M. Generalized linear models for identifying predictors of the evolutionary diffusion of viruses. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:23-8. [PMID: 25717395 PMCID: PMC4333690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Bioinformatics and phylogeography models use viral sequence data to analyze spread of epidemics and pandemics. However, few of these models have included analytical methods for testing whether certain predictors such as population density, rates of disease migration, and climate are drivers of spatial spread. Understanding the specific factors that drive spatial diffusion of viruses is critical for targeting public health interventions and curbing spread. In this paper we describe the application and evaluation of a model that integrates demographic and environmental predictors with molecular sequence data. The approach parameterizes evolutionary spread of RNA viruses as a generalized linear model (GLM) within a Bayesian inference framework using Markov chain Monte Carlo (MCMC). We evaluate this approach by reconstructing the spread of H5N1 in Egypt while assessing the impact of individual predictors on evolutionary diffusion of the virus.
Collapse
|
104
|
Worby CJ, Lipsitch M, Hanage WP. Within-host bacterial diversity hinders accurate reconstruction of transmission networks from genomic distance data. PLoS Comput Biol 2014; 10:e1003549. [PMID: 24675511 PMCID: PMC3967931 DOI: 10.1371/journal.pcbi.1003549] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 02/17/2014] [Indexed: 11/18/2022] Open
Abstract
The prospect of using whole genome sequence data to investigate bacterial disease outbreaks has been keenly anticipated in many quarters, and the large-scale collection and sequencing of isolates from cases is becoming increasingly feasible. While sequence data can provide many important insights into disease spread and pathogen adaptation, it remains unclear how successfully they may be used to estimate individual routes of transmission. Several studies have attempted to reconstruct transmission routes using genomic data; however, these have typically relied upon restrictive assumptions, such as a shared topology of the phylogenetic tree and a lack of within-host diversity. In this study, we investigated the potential for bacterial genomic data to inform transmission network reconstruction. We used simulation models to investigate the origins, persistence and onward transmission of genetic diversity, and examined the impact of such diversity on our estimation of the epidemiological relationship between carriers. We used a flexible distance-based metric to provide a weighted transmission network, and used receiver-operating characteristic (ROC) curves and network entropy to assess the accuracy and uncertainty of the inferred structure. Our results suggest that sequencing a single isolate from each case is inadequate in the presence of within-host diversity, and is likely to result in misleading interpretations of transmission dynamics – under many plausible conditions, this may be little better than selecting transmission links at random. Sampling more frequently improves accuracy, but much uncertainty remains, even if all genotypes are observed. While it is possible to discriminate between clusters of carriers, individual transmission routes cannot be resolved by sequence data alone. Our study demonstrates that bacterial genomic distance data alone provide only limited information on person-to-person transmission dynamics. With the advent of affordable large-scale genome sequencing for bacterial pathogens, there is much interest in using such data to identify who infected whom in a disease outbreak. Many methods exist to reconstruct the phylogeny of sampled bacteria, but the resulting tree does not necessarily share the same structure as the transmission tree linking infected persons. We explored the potential of sampled genomic data to inform the transmission tree, measuring the accuracy and precision of estimated networks based on simulated data. We demonstrated that failing to account for within-host diversity can lead to poor network reconstructions - even with repeated sampling of each carrier, there is still much uncertainty in the estimated structure. While it may be possible to identify clusters of potential sources, identifying individual transmission links is not possible using bacterial sequence data alone. This work highlights potential limitations of genomic data to investigate transmission dynamics, lending support to methods unifying all available data sources.
Collapse
Affiliation(s)
- Colin J. Worby
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America
- * E-mail:
| | - Marc Lipsitch
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - William P. Hanage
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
105
|
Kao RR, Haydon DT, Lycett SJ, Murcia PR. Supersize me: how whole-genome sequencing and big data are transforming epidemiology. Trends Microbiol 2014; 22:282-91. [PMID: 24661923 PMCID: PMC7125769 DOI: 10.1016/j.tim.2014.02.011] [Citation(s) in RCA: 90] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2013] [Revised: 02/17/2014] [Accepted: 02/24/2014] [Indexed: 01/08/2023]
Abstract
Whole-genome sequencing is used for forensic epidemiology. Big data can transform forensic epidemiology. Clustering, biases, wildlife reservoirs, and emerging infections can all be addressed. Phylodynamics approaches to integrate epidemiological and evolutionary data have been highly successful but still face scientific challenges.
In epidemiology, the identification of ‘who infected whom’ allows us to quantify key characteristics such as incubation periods, heterogeneity in transmission rates, duration of infectiousness, and the existence of high-risk groups. Although invaluable, the existence of many plausible infection pathways makes this difficult, and epidemiological contact tracing either uncertain, logistically prohibitive, or both. The recent advent of next-generation sequencing technology allows the identification of traceable differences in the pathogen genome that are transforming our ability to understand high-resolution disease transmission, sometimes even down to the host-to-host scale. We review recent examples of the use of pathogen whole-genome sequencing for the purpose of forensic tracing of transmission pathways, focusing on the particular problems where evolutionary dynamics must be supplemented by epidemiological information on the most likely timing of events as well as possible transmission pathways. We also discuss potential pitfalls in the over-interpretation of these data, and highlight the manner in which a confluence of this technology with sophisticated mathematical and statistical approaches has the potential to produce a paradigm shift in our understanding of infectious disease transmission and control.
Collapse
Affiliation(s)
- Rowland R Kao
- Boyd Orr Centre for Population and Ecosystem Health, College of Medical Veterinary and Life Sciences, University of Glasgow, G61 1QH, UK.
| | - Daniel T Haydon
- Boyd Orr Centre for Population and Ecosystem Health, College of Medical Veterinary and Life Sciences, University of Glasgow, G61 1QH, UK
| | - Samantha J Lycett
- Boyd Orr Centre for Population and Ecosystem Health, College of Medical Veterinary and Life Sciences, University of Glasgow, G61 1QH, UK
| | - Pablo R Murcia
- Medical Research Council (MRC) Centre for Virus Research, College of Medical, Veterinary and Life Sciences, University of Glasgow, G61 1QH, UK
| |
Collapse
|
106
|
Mollentze N, Nel LH, Townsend S, le Roux K, Hampson K, Haydon DT, Soubeyrand S. A Bayesian approach for inferring the dynamics of partially observed endemic infectious diseases from space-time-genetic data. Proc Biol Sci 2014; 281:20133251. [PMID: 24619442 DOI: 10.1098/rspb.2013.3251] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
We describe a statistical framework for reconstructing the sequence of transmission events between observed cases of an endemic infectious disease using genetic, temporal and spatial information. Previous approaches to reconstructing transmission trees have assumed all infections in the study area originated from a single introduction and that a large fraction of cases were observed. There are as yet no approaches appropriate for endemic situations in which a disease is already well established in a host population and in which there may be multiple origins of infection, or that can enumerate unobserved infections missing from the sample. Our proposed framework addresses these shortcomings, enabling reconstruction of partially observed transmission trees and estimating the number of cases missing from the sample. Analyses of simulated datasets show the method to be accurate in identifying direct transmissions, while introductions and transmissions via one or more unsampled intermediate cases could be identified at high to moderate levels of case detection. When applied to partial genome sequences of rabies virus sampled from an endemic region of South Africa, our method reveals several distinct transmission cycles with little contact between them, and direct transmission over long distances suggesting significant anthropogenic influence in the movement of infected dogs.
Collapse
Affiliation(s)
- Nardus Mollentze
- Department of Microbiology and Plant Pathology, University of Pretoria, , Pretoria 0002, South Africa, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, , Glasgow G12 8QQ, UK, Directorate of Veterinary Services, KwaZulu Natal Department of Agriculture and Environmental Affairs, , Pietermaritzburg 3202, South Africa, INRA, UR546 Biostatistics and Spatial Processes, , 84914 Avignon CEDEX 9, France
| | | | | | | | | | | | | |
Collapse
|
107
|
Harris SR, Okoro CK. Whole-Genome Sequencing for Rapid and Accurate Identification of Bacterial Transmission Pathways. J Microbiol Methods 2014. [DOI: 10.1016/bs.mim.2014.07.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|