1
|
Carson J, Keeling M, Wyllie D, Ribeca P, Didelot X. Inference of Infectious Disease Transmission through a Relaxed Bottleneck Using Multiple Genomes Per Host. Mol Biol Evol 2024; 41:msad288. [PMID: 38168711 PMCID: PMC10798190 DOI: 10.1093/molbev/msad288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 12/21/2023] [Accepted: 12/29/2023] [Indexed: 01/05/2024] Open
Abstract
In recent times, pathogen genome sequencing has become increasingly used to investigate infectious disease outbreaks. When genomic data is sampled densely enough amongst infected individuals, it can help resolve who infected whom. However, transmission analysis cannot rely solely on a phylogeny of the genomes but must account for the within-host evolution of the pathogen, which blurs the relationship between phylogenetic and transmission trees. When only a single genome is sampled for each host, the uncertainty about who infected whom can be quite high. Consequently, transmission analysis based on multiple genomes of the same pathogen per host has a clear potential for delivering more precise results, even though it is more laborious to achieve. Here, we present a new methodology that can use any number of genomes sampled from a set of individuals to reconstruct their transmission network. Furthermore, we remove the need for the assumption of a complete transmission bottleneck. We use simulated data to show that our method becomes more accurate as more genomes per host are provided, and that it can infer key infectious disease parameters such as the size of the transmission bottleneck, within-host growth rate, basic reproduction number, and sampling fraction. We demonstrate the usefulness of our method in applications to real datasets from an outbreak of Pseudomonas aeruginosa amongst cystic fibrosis patients and a nosocomial outbreak of Klebsiella pneumoniae.
Collapse
Affiliation(s)
- Jake Carson
- Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
- Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), University of Warwick, Coventry CV4 7AL, UK
| | - Matt Keeling
- Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
- Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), University of Warwick, Coventry CV4 7AL, UK
| | | | | | - Xavier Didelot
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
- Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research (SBIDER), University of Warwick, Coventry CV4 7AL, UK
- Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| |
Collapse
|
2
|
Susvitasari K, Tupper P, Stockdale JE, Colijn C. A method to estimate the serial interval distribution under partially-sampled data. Epidemics 2023; 45:100733. [PMID: 38056165 DOI: 10.1016/j.epidem.2023.100733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 11/22/2023] [Accepted: 11/26/2023] [Indexed: 12/08/2023] Open
Abstract
The serial interval of an infectious disease is an important variable in epidemiology. It is defined as the period of time between the symptom onset times of the infector and infectee in a direct transmission pair. Under partially sampled data, purported infector-infectee pairs may actually be separated by one or more unsampled cases in between. Misunderstanding such pairs as direct transmissions will result in overestimating the length of serial intervals. On the other hand, two cases that are infected by an unseen third case (known as coprimary transmission) may be classified as a direct transmission pair, leading to an underestimation of the serial interval. Here, we introduce a method to jointly estimate the distribution of serial intervals factoring in these two sources of error. We simultaneously estimate the distribution of the number of unsampled intermediate cases between purported infector-infectee pairs, as well as the fraction of such pairs that are coprimary. We also extend our method to situations where each infectee has multiple possible infectors, and show how to factor this additional source of uncertainty into our estimates. We assess our method's performance on simulated data sets and find that our method provides consistent and robust estimates. We also apply our method to data from real-life outbreaks of four infectious diseases and compare our results with published results. With similar accuracy, our method of estimating serial interval distribution provides unique advantages, allowing its application in settings of low sampling rates and large population sizes, such as widespread community transmission tracked by routine public health surveillance.
Collapse
Affiliation(s)
| | - Paul Tupper
- Department of Mathematics, Simon Fraser University, Canada
| | | | | |
Collapse
|
3
|
Van der Roest BR, Bootsma MCJ, Fischer EAJ, Klinkenberg D, Kretzschmar MEE. A Bayesian inference method to estimate transmission trees with multiple introductions; applied to SARS-CoV-2 in Dutch mink farms. PLoS Comput Biol 2023; 19:e1010928. [PMID: 38011266 PMCID: PMC10703282 DOI: 10.1371/journal.pcbi.1010928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 12/07/2023] [Accepted: 11/12/2023] [Indexed: 11/29/2023] Open
Abstract
Knowledge of who infected whom during an outbreak of an infectious disease is important to determine risk factors for transmission and to design effective control measures. Both whole-genome sequencing of pathogens and epidemiological data provide useful information about the transmission events and underlying processes. Existing models to infer transmission trees usually assume that the pathogen is introduced only once from outside into the population of interest. However, this is not always true. For instance, SARS-CoV-2 is suggested to be introduced multiple times in mink farms in the Netherlands from the SARS-CoV-2 pandemic among humans. Here, we developed a Bayesian inference method combining whole-genome sequencing data and epidemiological data, allowing for multiple introductions of the pathogen in the population. Our method does not a priori split the outbreak into multiple phylogenetic clusters, nor does it break the dependency between the processes of mutation, within-host dynamics, transmission, and observation. We implemented our method as an additional feature in the R-package phybreak. On simulated data, our method correctly identifies the number of introductions, with an accuracy depending on the proportion of all observed cases that are introductions. Moreover, when a single introduction was simulated, our method produced similar estimates of parameters and transmission trees as the existing package. When applied to data from a SARS-CoV-2 outbreak in Dutch mink farms, the method provides strong evidence for independent introductions of the pathogen at 13 farms, infecting a total of 63 farms. Using the new feature of the phybreak package, transmission routes of a more complex class of infectious disease outbreaks can be inferred which will aid infection control in future outbreaks.
Collapse
Affiliation(s)
- Bastiaan R. Van der Roest
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
| | - Martin C. J. Bootsma
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Department of Mathematics, Faculty of Science, Utrecht University, Utrecht, Netherlands
| | - Egil A. J. Fischer
- Department of Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, Netherlands
| | - Don Klinkenberg
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Mirjam E. E. Kretzschmar
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, Netherlands
- Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| |
Collapse
|
4
|
Specht IOA, Petros BA, Moreno GK, Brock-Fisher T, Krasilnikova LA, Schifferli M, Yang K, Cronan P, Glennon O, Schaffner SF, Park DJ, MacInnis BL, Ozonoff A, Fry B, Mitzenmacher MD, Varilly P, Sabeti PC. Inferring Viral Transmission Pathways from Within-Host Variation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.14.23297039. [PMID: 37873325 PMCID: PMC10593003 DOI: 10.1101/2023.10.14.23297039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Genome sequencing can offer critical insight into pathogen spread in viral outbreaks, but existing transmission inference methods use simplistic evolutionary models and only incorporate a portion of available genetic data. Here, we develop a robust evolutionary model for transmission reconstruction that tracks the genetic composition of within-host viral populations over time and the lineages transmitted between hosts. We confirm that our model reliably describes within-host variant frequencies in a dataset of 134,682 SARS-CoV-2 deep-sequenced genomes from Massachusetts, USA. We then demonstrate that our reconstruction approach infers transmissions more accurately than two leading methods on synthetic data, as well as in a controlled outbreak of bovine respiratory syncytial virus and an epidemiologically-investigated SARS-CoV-2 outbreak in South Africa. Finally, we apply our transmission reconstruction tool to 5,692 outbreaks among the 134,682 Massachusetts genomes. Our methods and results demonstrate the utility of within-host variation for transmission inference of SARS-CoV-2 and other pathogens, and provide an adaptable mathematical framework for tracking within-host evolution.
Collapse
Affiliation(s)
- Ivan O. A. Specht
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard College, Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Brittany A. Petros
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA 02139, USA
- Harvard/MIT MD-PhD Program, Boston, MA 02115, USA
- Systems, Synthetic, and Quantitative Biology PhD Program, Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Gage K. Moreno
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Taylor Brock-Fisher
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Organismic and Evolutionary Biology, Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Lydia A. Krasilnikova
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | | | | | - Paul Cronan
- Fathom Information Design, Boston, MA 02114, USA
| | | | | | - Daniel J. Park
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Bronwyn L. MacInnis
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
- Massachusetts Consortium on Pathogen Readiness, Harvard Medical School, Harvard University, Boston, MA 02115, USA
| | - Al Ozonoff
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ben Fry
- Fathom Information Design, Boston, MA 02114, USA
| | - Michael D. Mitzenmacher
- Department of Computer Science, School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Patrick Varilly
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Pardis C. Sabeti
- The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Organismic and Evolutionary Biology, Faculty of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA
- Massachusetts Consortium on Pathogen Readiness, Harvard Medical School, Harvard University, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
5
|
Ribaud M, Gabriel E, Hughes J, Soubeyrand S. Identifying potential significant factors impacting zero-inflated proportion data. Stat Med 2023; 42:3467-3486. [PMID: 37290435 DOI: 10.1002/sim.9814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 04/03/2023] [Accepted: 05/19/2023] [Indexed: 06/10/2023]
Abstract
Classical supervised methods like linear regression and decision trees are not completely adapted for identifying impacting factors on a response variable corresponding to zero-inflated proportion data (ZIPD) that are dependent, continuous and bounded. In this article we propose a within-block permutation-based methodology to identify factors (discrete or continuous) that are significantly correlated with ZIPD, we propose a performance indicator quantifying the percentage of correlation explained by the subset of significant factors, and we show how to predict the ranks of the response variables conditionally on the observation of these factors. The methodology is illustrated on simulated data and on two real data sets dealing with epidemiology. In the first data set, ZIPD correspond to probabilities of transmission of Influenza between horses. In the second data set, ZIPD correspond to probabilities that geographic entities (eg, states and countries) have the same COVID-19 mortality dynamics.
Collapse
Affiliation(s)
| | | | - Joseph Hughes
- Centre for Virus Research, MRC-University of Glasgow, Glasgow, UK
| | | |
Collapse
|
6
|
Stockdale JE, Susvitasari K, Tupper P, Sobkowiak B, Mulberry N, Gonçalves da Silva A, Watt AE, Sherry NL, Minko C, Howden BP, Lane CR, Colijn C. Genomic epidemiology offers high resolution estimates of serial intervals for COVID-19. Nat Commun 2023; 14:4830. [PMID: 37563113 PMCID: PMC10415581 DOI: 10.1038/s41467-023-40544-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 07/31/2023] [Indexed: 08/12/2023] Open
Abstract
Serial intervals - the time between symptom onset in infector and infectee - are a fundamental quantity in infectious disease control. However, their estimation requires knowledge of individuals' exposures, typically obtained through resource-intensive contact tracing efforts. We introduce an alternate framework using virus sequences to inform who infected whom and thereby estimate serial intervals. We apply our technique to SARS-CoV-2 sequences from case clusters in the first two COVID-19 waves in Victoria, Australia. We find that our approach offers high resolution, cluster-specific serial interval estimates that are comparable with those obtained from contact data, despite requiring no knowledge of who infected whom and relying on incompletely-sampled data. Compared to a published serial interval, cluster-specific serial intervals can vary estimates of the effective reproduction number by a factor of 2-3. We find that serial interval estimates in settings such as schools and meat processing/packing plants are shorter than those in healthcare facilities.
Collapse
Affiliation(s)
| | | | - Paul Tupper
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | | | - Nicola Mulberry
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| | - Anders Gonçalves da Silva
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology & Immunology, University of Melbourne at the Peter Doherty Institute for Infection & Immunity, Melbourne, VIC, Australia
| | - Anne E Watt
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology & Immunology, University of Melbourne at the Peter Doherty Institute for Infection & Immunity, Melbourne, VIC, Australia
| | - Norelle L Sherry
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology & Immunology, University of Melbourne at the Peter Doherty Institute for Infection & Immunity, Melbourne, VIC, Australia
| | - Corinna Minko
- Victorian Department of Health, Melbourne, VIC, Australia
| | - Benjamin P Howden
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology & Immunology, University of Melbourne at the Peter Doherty Institute for Infection & Immunity, Melbourne, VIC, Australia
| | - Courtney R Lane
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology & Immunology, University of Melbourne at the Peter Doherty Institute for Infection & Immunity, Melbourne, VIC, Australia
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
7
|
Robert A, Tsui Lok Hei J, Watson CH, Gsell PS, Hall Y, Rambaut A, Longini IM, Sakoba K, Kucharski AJ, Touré A, Danmadji Nadlaou S, Saidou Barry M, Fofana TO, Lansana Kaba I, Sylla L, Diaby ML, Soumah O, Diallo A, Niare A, Diallo A, Eggo RM, Caroll MW, Henao-Restrepo AM, Edmunds WJ, Hué S. Quantifying the value of viral genomics when inferring who infected whom in the 2014-16 Ebola virus outbreak in Guinea. Virus Evol 2023; 9:vead007. [PMID: 36926449 PMCID: PMC10013732 DOI: 10.1093/ve/vead007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 11/17/2022] [Accepted: 03/06/2023] [Indexed: 03/16/2023] Open
Abstract
Transmission trees can be established through detailed contact histories, statistical or phylogenetic inference, or a combination of methods. Each approach has its limitations, and the extent to which they succeed in revealing a 'true' transmission history remains unclear. In this study, we compared the transmission trees obtained through contact tracing investigations and various inference methods to identify the contribution and value of each approach. We studied eighty-six sequenced cases reported in Guinea between March and November 2015. Contact tracing investigations classified these cases into eight independent transmission chains. We inferred the transmission history from the genetic sequences of the cases (phylogenetic approach), their onset date (epidemiological approach), and a combination of both (combined approach). The inferred transmission trees were then compared to those from the contact tracing investigations. Inference methods using individual data sources (i.e. the phylogenetic analysis and the epidemiological approach) were insufficiently informative to accurately reconstruct the transmission trees and the direction of transmission. The combined approach was able to identify a reduced pool of infectors for each case and highlight likely connections among chains classified as independent by the contact tracing investigations. Overall, the transmissions identified by the contact tracing investigations agreed with the evolutionary history of the viral genomes, even though some cases appeared to be misclassified. Therefore, collecting genetic sequences during outbreak is key to supplement the information contained in contact tracing investigations. Although none of the methods we used could identify one unique infector per case, the combined approach highlighted the added value of mixing epidemiological and genetic information to reconstruct who infected whom.
Collapse
Affiliation(s)
- Alexis Robert
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
| | - Joseph Tsui Lok Hei
- Department of Biology, University of Oxford, South Parks Road, Oxford OX1 3RB, UK
| | - Conall H Watson
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
- Epidemic Diseases Research Group Oxford, University of Oxford, Old Road Campus, Roosevelt Drive, Oxford OX3 7LG, UK
| | | | - Yper Hall
- UK Health Security Agency, Manor Farm Rd, Porton Down, Salisbury SP4 0JG, UK
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Charlotte Auerbach Road, Edinburgh EH9 3FL, UK
| | - Ira M Longini
- Department of Biostatistics, University of Florida, 2004 Mowry Road, 5th Floor CTRB, Gainesville, FL 32611-7450, USA
| | - Keïta Sakoba
- World Health Organization Ebola Vaccination Team, Sonfonia T.7, Conakry, Guinea
| | - Adam J Kucharski
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
| | - Alhassane Touré
- World Health Organization Ebola Vaccination Team, Sonfonia T.7, Conakry, Guinea
| | | | | | | | | | - Lansana Sylla
- World Health Organization Ebola Vaccination Team, Sonfonia T.7, Conakry, Guinea
| | | | - Ousmane Soumah
- World Health Organization Ebola Vaccination Team, Sonfonia T.7, Conakry, Guinea
| | - Abdourahime Diallo
- World Health Organization Ebola Vaccination Team, Sonfonia T.7, Conakry, Guinea
| | - Amadou Niare
- World Health Organization Ebola Vaccination Team, Sonfonia T.7, Conakry, Guinea
| | | | - Rosalind M Eggo
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
| | - Miles W Caroll
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Dr, Headington, Oxford OX3 7BN, UK
| | | | - W John Edmunds
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
| | - Stéphane Hué
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 6HT, UK
| |
Collapse
|
8
|
Van Borm S, Boseret G, Dellicour S, Steensels M, Roupie V, Vandenbussche F, Mathijs E, Vilain A, Driesen M, Dispas M, Delcloo AW, Lemey P, Mertens I, Gilbert M, Lambrecht B, van den Berg T. Combined Phylogeographic Analyses and Epidemiologic Contact Tracing to Characterize Atypically Pathogenic Avian Influenza (H3N1) Epidemic, Belgium, 2019. Emerg Infect Dis 2023; 29:351-359. [PMID: 36692362 PMCID: PMC9881769 DOI: 10.3201/eid2902.220765] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The high economic impact and zoonotic potential of avian influenza call for detailed investigations of dispersal dynamics of epidemics. We integrated phylogeographic and epidemiologic analyses to investigate the dynamics of a low pathogenicity avian influenza (H3N1) epidemic that occurred in Belgium during 2019. Virus genomes from 104 clinical samples originating from 85% of affected farms were sequenced. A spatially explicit phylogeographic analysis confirmed a dominating northeast to southwest dispersal direction and a long-distance dispersal event linked to direct live animal transportation between farms. Spatiotemporal clustering, transport, and social contacts strongly correlated with the phylogeographic pattern of the epidemic. We detected only a limited association between wind direction and direction of viral lineage dispersal. Our results highlight the multifactorial nature of avian influenza epidemics and illustrate the use of genomic analyses of virus dispersal to complement epidemiologic and environmental data, improve knowledge of avian influenza epidemiologic dynamics, and enhance control strategies.
Collapse
|
9
|
Goldstein IH, Bayer D, Barilar I, Kizito B, Matsiri O, Modongo C, Zetola NM, Niemann S, Minin VM, Shin SS. Using genetic data to identify transmission risk factors: Statistical assessment and application to tuberculosis transmission. PLoS Comput Biol 2022; 18:e1010696. [PMID: 36469509 PMCID: PMC9754595 DOI: 10.1371/journal.pcbi.1010696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 12/15/2022] [Accepted: 10/31/2022] [Indexed: 12/12/2022] Open
Abstract
Identifying host factors that influence infectious disease transmission is an important step toward developing interventions to reduce disease incidence. Recent advances in methods for reconstructing infectious disease transmission events using pathogen genomic and epidemiological data open the door for investigation of host factors that affect onward transmission. While most transmission reconstruction methods are designed to work with densely sampled outbreaks, these methods are making their way into surveillance studies, where the fraction of sampled cases with sequenced pathogens could be relatively low. Surveillance studies that use transmission event reconstruction then use the reconstructed events as response variables (i.e., infection source status of each sampled case) and use host characteristics as predictors (e.g., presence of HIV infection) in regression models. We use simulations to study estimation of the effect of a host factor on probability of being an infection source via this multi-step inferential procedure. Using TransPhylo-a widely-used method for Bayesian estimation of infectious disease transmission events-and logistic regression, we find that low sensitivity of identifying infection sources leads to dilution of the signal, biasing logistic regression coefficients toward zero. We show that increasing the proportion of sampled cases improves sensitivity and some, but not all properties of the logistic regression inference. Application of these approaches to real world data from a population-based TB study in Botswana fails to detect an association between HIV infection and probability of being a TB infection source. We conclude that application of a pipeline, where one first uses TransPhylo and sparsely sampled surveillance data to infer transmission events and then estimates effects of host characteristics on probabilities of these events, should be accompanied by a realistic simulation study to better understand biases stemming from imprecise transmission event inference.
Collapse
Affiliation(s)
- Isaac H. Goldstein
- Department of Statistics, University of California, Irvine, California, United States of America
| | - Damon Bayer
- Department of Statistics, University of California, Irvine, California, United States of America
| | - Ivan Barilar
- German Center for Infection Research, Research Center Borstel, Borstel, Germany
| | | | | | | | | | - Stefan Niemann
- German Center for Infection Research, Research Center Borstel, Borstel, Germany
| | - Volodymyr M. Minin
- Department of Statistics, University of California, Irvine, California, United States of America
| | - Sanghyuk S. Shin
- Sue & Bill Gross School of Nursing, University of California, Irvine, California, United States of America
| |
Collapse
|
10
|
Chao E, Chato C, Vender R, Olabode AS, Ferreira RC, Poon AFY. Molecular source attribution. PLoS Comput Biol 2022; 18:e1010649. [PMID: 36395093 PMCID: PMC9671344 DOI: 10.1371/journal.pcbi.1010649] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Elisa Chao
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Connor Chato
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Reid Vender
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
- School of Medicine, Queen’s University, Kingston, Ontario, Canada
| | - Abayomi S. Olabode
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Roux-Cil Ferreira
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
| | - Art F. Y. Poon
- Department of Pathology and Laboratory Medicine, Western University, London, Ontario, Canada
- * E-mail:
| |
Collapse
|
11
|
Skums P, Mohebbi F, Tsyvina V, Baykal PI, Nemira A, Ramachandran S, Khudyakov Y. SOPHIE: Viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework. Cell Syst 2022; 13:844-856.e4. [PMID: 36265470 PMCID: PMC9590096 DOI: 10.1016/j.cels.2022.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 07/05/2022] [Accepted: 07/19/2022] [Indexed: 01/26/2023]
Abstract
Genomic epidemiology is now widely used for viral outbreak investigations. Still, this methodology faces many challenges. First, few methods account for intra-host viral diversity. Second, maximum parsimony principle continues to be employed for phylogenetic inference of transmission histories, even though maximum likelihood or Bayesian models are usually more consistent. Third, many methods utilize case-specific data, such as sampling times or infection exposure intervals. This impedes study of persistent infections in vulnerable groups, where such information has a limited use. Finally, most methods implicitly assume that transmission events are independent, although common source outbreaks violate this assumption. We propose a maximum likelihood framework, SOPHIE, based on the integration of phylogenetic and random graph models. It infers transmission networks from viral phylogenies and expected properties of inter-host social networks modeled as random graphs with given expected degree distributions. SOPHIE is scalable, accounts for intra-host diversity, and accurately infers transmissions without case-specific epidemiological data.
Collapse
Affiliation(s)
- Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, USA.
| | - Fatemeh Mohebbi
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Vyacheslav Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Pelin Icer Baykal
- Department of Biosystems Science & Engineering, ETH Zurich, Basel, Switzerland
| | - Alina Nemira
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Sumathi Ramachandran
- Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Yury Khudyakov
- Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| |
Collapse
|
12
|
Didelot X, Parkhill J. A scalable analytical approach from bacterial genomes to epidemiology. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210246. [PMID: 35989600 PMCID: PMC9393561 DOI: 10.1098/rstb.2021.0246] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 02/17/2022] [Indexed: 12/21/2022] Open
Abstract
Recent years have seen a remarkable increase in the practicality of sequencing whole genomes from large numbers of bacterial isolates. The availability of this data has huge potential to deliver new insights into the evolution and epidemiology of bacterial pathogens, but the scalability of the analytical methodology has been lagging behind that of the sequencing technology. Here we present a step-by-step approach for such large-scale genomic epidemiology analyses, from bacterial genomes to epidemiological interpretations. A central component of this approach is the dated phylogeny, which is a phylogenetic tree with branch lengths measured in units of time. The construction of dated phylogenies from bacterial genomic data needs to account for the disruptive effect of recombination on phylogenetic relationships, and we describe how this can be achieved. Dated phylogenies can then be used to perform fine-scale or large-scale epidemiological analyses, depending on the proportion of cases for which genomes are available. A key feature of this approach is computational scalability and in particular the ability to process hundreds or thousands of genomes within a matter of hours. This is a clear advantage of the step-by-step approach described here. We discuss other advantages and disadvantages of the approach, as well as potential improvements and avenues for future research. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
- Xavier Didelot
- School of Life Sciences and Department of Statistics, University of Warwick, Coventry CV4 7AL, UK
| | - Julian Parkhill
- Department of Veterinary Medicine, University of Cambridge, Cambridge CB3 0ES, UK
| |
Collapse
|
13
|
Alamil M, Thébaud G, Berthier K, Soubeyrand S. Characterizing viral within-host diversity in fast and non-equilibrium demo-genetic dynamics. Front Microbiol 2022; 13:983938. [PMID: 36274731 PMCID: PMC9581327 DOI: 10.3389/fmicb.2022.983938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 09/08/2022] [Indexed: 11/13/2022] Open
Abstract
High-throughput sequencing has opened the route for a deep assessment of within-host genetic diversity that can be used, e.g., to characterize microbial communities and to infer transmission links in infectious disease outbreaks. The performance of such characterizations and inferences cannot be analytically assessed in general and are often grounded on computer-intensive evaluations. Then, being able to simulate within-host genetic diversity across time under various demo-genetic assumptions is paramount to assess the performance of the approaches of interest. In this context, we built an original model that can be simulated to investigate the temporal evolution of genotypes and their frequencies under various demo-genetic assumptions. The model describes the growth and the mutation of genotypes at the nucleotide resolution conditional on an overall within-host viral kinetics, and can be tuned to generate fast non-equilibrium demo-genetic dynamics. We ran simulations of this model and computed classic diversity indices to characterize the temporal variation of within-host genetic diversity (from high-throughput amplicon sequences) of virus populations under three demographic kinetic models of viral infection. Our results highlight how demographic (viral load) and genetic (mutation, selection, or drift) factors drive variations in within-host diversity during the course of an infection. In particular, we observed a non-monotonic relationship between pathogen population size and genetic diversity, and a reduction of the impact of mutation on diversity when a non-specific host immune response is activated. The large variation in the diversity patterns generated in our simulations suggests that the underlying model provides a flexible basis to produce very diverse demo-genetic scenarios and test, for instance, methods for the inference of transmission links during outbreaks.
Collapse
Affiliation(s)
- Maryam Alamil
- INRAE, BioSP, Avignon, France
- Department of Mathematics and Computer Science, Alfaisal University, Riyadh, Saudi Arabia
- *Correspondence: Maryam Alamil ;
| | - Gaël Thébaud
- PHIM Plant Health Institute, INRAE, Univ Montpellier, CIRAD, Institut Agro, IRD, Montpellier, France
| | | | | |
Collapse
|
14
|
Leavitt SV, Jenkins HE, Sebastiani P, Lee RS, Horsburgh CR, Tibbs AM, White LF. Estimation of the generation interval using pairwise relative transmission probabilities. Biostatistics 2022; 23:807-824. [PMID: 33527996 PMCID: PMC9291635 DOI: 10.1093/biostatistics/kxaa059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 12/07/2020] [Accepted: 12/08/2020] [Indexed: 11/13/2022] Open
Abstract
The generation interval (the time between infection of primary and secondary cases) and its often used proxy, the serial interval (the time between symptom onset of primary and secondary cases) are critical parameters in understanding infectious disease dynamics. Because it is difficult to determine who infected whom, these important outbreak characteristics are not well understood for many diseases. We present a novel method for estimating transmission intervals using surveillance or outbreak investigation data that, unlike existing methods, does not require a contact tracing data or pathogen whole genome sequence data on all cases. We start with an expectation maximization algorithm and incorporate relative transmission probabilities with noise reduction. We use simulations to show that our method can accurately estimate the generation interval distribution for diseases with different reproductive numbers, generation intervals, and mutation rates. We then apply our method to routinely collected surveillance data from Massachusetts (2010-2016) to estimate the serial interval of tuberculosis in this setting.
Collapse
Affiliation(s)
- Sarah V Leavitt
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - Helen E Jenkins
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - Robyn S Lee
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - C Robert Horsburgh
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - Andrew M Tibbs
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| | - Laura F White
- Department of Biostatistics, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; Epidemiology Division, University of Toronto Dalla Lana School of Public Health, 155 College St Room 500, Toronto, ON M5T 3M7, Canada; Department of Epidemiology, Boston University School of Public Health, 801 Massachusetts Ave, Boston, MA 02118; and Massachusetts Department of Public Health, 250 Washington St, Boston, MA 02108
| |
Collapse
|
15
|
Colijn C, Earn DJD, Dushoff J, Ogden NH, Li M, Knox N, Van Domselaar G, Franklin K, Jolly G, Otto SP. The need for linked genomic surveillance of SARS-CoV-2. CANADA COMMUNICABLE DISEASE REPORT = RELEVE DES MALADIES TRANSMISSIBLES AU CANADA 2022; 48:131-139. [PMID: 35480703 PMCID: PMC9017802 DOI: 10.14745/ccdr.v48i04a03] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Genomic surveillance during the coronavirus disease 2019 (COVID-19) pandemic has been key to the timely identification of virus variants with important public health consequences, such as variants that can transmit among and cause severe disease in both vaccinated or recovered individuals. The rapid emergence of the Omicron variant highlighted the speed with which the extent of a threat must be assessed. Rapid sequencing and public health institutions' openness to sharing sequence data internationally give an unprecedented opportunity to do this; however, assessing the epidemiological and clinical properties of any new variant remains challenging. Here we highlight a "band of four" key data sources that can help to detect viral variants that threaten COVID-19 management: 1) genetic (virus sequence) data; 2) epidemiological and geographic data; 3) clinical and demographic data; and 4) immunization data. We emphasize the benefits that can be achieved by linking data from these sources and by combining data from these sources with virus sequence data. The considerable challenges of making genomic data available and linked with virus and patient attributes must be balanced against major consequences of not doing so, especially if new variants of concern emerge and spread without timely detection and action.
Collapse
Affiliation(s)
- Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, BC
| | - David JD Earn
- Department of Mathematics & Statistics and M. G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON
| | - Jonathan Dushoff
- Department of Biology and M. G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON
| | - Nicholas H Ogden
- Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, St.-Hyacinthe, QC
| | - Michael Li
- Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON
| | - Natalie Knox
- National Microbiology Laboratory, Public Health Agency of Canada and Department of Medical Microbiology & Infectious Diseases, University of Manitoba, Winnipeg, MB
| | - Gary Van Domselaar
- National Microbiology Laboratory, Public Health Agency of Canada and Department of Medical Microbiology & Infectious Diseases, University of Manitoba, Winnipeg, MB
| | - Kristyn Franklin
- Centre for Immunization and Respiratory Infectious Diseases, Public Health Agency of Canada, Calgary, AB
| | - Gordon Jolly
- Public Health Genomics, Public Health Agency of Canada
| | - Sarah P Otto
- Department of Zoology & Biodiversity Research Centre, University of British Columbia, Vancouver, BC
| |
Collapse
|
16
|
Methods Combining Genomic and Epidemiological Data in the Reconstruction of Transmission Trees: A Systematic Review. Pathogens 2022; 11:pathogens11020252. [PMID: 35215195 PMCID: PMC8875843 DOI: 10.3390/pathogens11020252] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/08/2022] [Accepted: 02/11/2022] [Indexed: 11/17/2022] Open
Abstract
In order to better understand transmission dynamics and appropriately target control and preventive measures, studies have aimed to identify who-infected-whom in actual outbreaks. Numerous reconstruction methods exist, each with their own assumptions, types of data, and inference strategy. Thus, selecting a method can be difficult. Following PRISMA guidelines, we systematically reviewed the literature for methods combing epidemiological and genomic data in transmission tree reconstruction. We identified 22 methods from the 41 selected articles. We defined three families according to how genomic data was handled: a non-phylogenetic family, a sequential phylogenetic family, and a simultaneous phylogenetic family. We discussed methods according to the data needed as well as the underlying sequence mutation, within-host evolution, transmission, and case observation. In the non-phylogenetic family consisting of eight methods, pairwise genetic distances were estimated. In the phylogenetic families, transmission trees were inferred from phylogenetic trees either simultaneously (nine methods) or sequentially (five methods). While a majority of methods (17/22) modeled the transmission process, few (8/22) took into account imperfect case detection. Within-host evolution was generally (7/8) modeled as a coalescent process. These practical and theoretical considerations were highlighted in order to help select the appropriate method for an outbreak.
Collapse
|
17
|
Dhar S, Zhang C, Măndoiu II, Bansal MS. TNet: Transmission Network Inference Using Within-Host Strain Diversity and its Application to Geographical Tracking of COVID-19 Spread. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:230-242. [PMID: 34255632 PMCID: PMC8956368 DOI: 10.1109/tcbb.2021.3096455] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 07/03/2021] [Accepted: 07/08/2021] [Indexed: 06/13/2023]
Abstract
The inference of disease transmission networks is an important problem in epidemiology. One popular approach for building transmission networks is to reconstruct a phylogenetic tree using sequences from disease strains sampled from infected hosts and infer transmissions based on this tree. However, most existing phylogenetic approaches for transmission network inference are highly computationally intensive and cannot take within-host strain diversity into account. Here, we introduce a new phylogenetic approach for inferring transmission networks, TNet, that addresses these limitations. TNet uses multiple strain sequences from each sampled host to infer transmissions and is simpler and more accurate than existing approaches. Furthermore, TNet is highly scalable and able to distinguish between ambiguous and unambiguous transmission inferences. We evaluated TNet on a large collection of 560 simulated transmission networks of various sizes and diverse host, sequence, and transmission characteristics, as well as on 10 real transmission datasets with known transmission histories. Our results show that TNet outperforms two other recently developed methods, phyloscanner and SharpTNI, that also consider within-host strain diversity. We also applied TNet to a large collection of SARS-CoV-2 genomes sampled from infected individuals in many countries around the world, demonstrating how our inference framework can be adapted to accurately infer geographical transmission networks. TNet is freely available from https://compbio.engr.uconn.edu/software/TNet/.
Collapse
Affiliation(s)
- Saurav Dhar
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Chengchen Zhang
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Ion I. Măndoiu
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Mukul S. Bansal
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| |
Collapse
|
18
|
Gallego-García P, Varela N, Estévez-Gómez N, De Chiara L, Fernández-Silva I, Valverde D, Sapoval N, Treangen TJ, Regueiro B, Cabrera-Alvargonzález JJ, del Campo V, Pérez S, Posada D. OUP accepted manuscript. Virus Evol 2022; 8:veac008. [PMID: 35242361 PMCID: PMC8889950 DOI: 10.1093/ve/veac008] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 12/21/2021] [Accepted: 02/04/2022] [Indexed: 11/23/2022] Open
Abstract
A detailed understanding of how and when severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission occurs is crucial for designing effective prevention measures. Other than contact tracing, genome sequencing provides information to help infer who infected whom. However, the effectiveness of the genomic approach in this context depends on both (high enough) mutation and (low enough) transmission rates. Today, the level of resolution that we can obtain when describing SARS-CoV-2 outbreaks using just genomic information alone remains unclear. In order to answer this question, we sequenced forty-nine SARS-CoV-2 patient samples from ten local clusters in NW Spain for which partial epidemiological information was available and inferred transmission history using genomic variants. Importantly, we obtained high-quality genomic data, sequencing each sample twice and using unique barcodes to exclude cross-sample contamination. Phylogenetic and cluster analyses showed that consensus genomes were generally sufficient to discriminate among independent transmission clusters. However, levels of intrahost variation were low, which prevented in most cases the unambiguous identification of direct transmission events. After filtering out recurrent variants across clusters, the genomic data were generally compatible with the epidemiological information but did not support specific transmission events over possible alternatives. We estimated the effective transmission bottleneck size to be one to two viral particles for sample pairs whose donor–recipient relationship was likely. Our analyses suggest that intrahost genomic variation in SARS-CoV-2 might be generally limited and that homoplasy and recurrent errors complicate identifying shared intrahost variants. Reliable reconstruction of direct SARS-CoV-2 transmission based solely on genomic data seems hindered by a slow mutation rate, potential convergent events, and technical artifacts. Detailed contact tracing seems essential in most cases to study SARS-CoV-2 transmission at high resolution.
Collapse
Affiliation(s)
| | - Nair Varela
- CINBIO, Universidade de Vigo, Vigo 36310, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
| | - Nuria Estévez-Gómez
- CINBIO, Universidade de Vigo, Vigo 36310, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
| | - Loretta De Chiara
- CINBIO, Universidade de Vigo, Vigo 36310, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
| | - Iria Fernández-Silva
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, Vigo 36310, Spain
| | - Diana Valverde
- CINBIO, Universidade de Vigo, Vigo 36310, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, Vigo 36310, Spain
| | | | | | - Benito Regueiro
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
- Department of Microbiology, Complexo Hospitalario Universitario de Vigo (CHUVI), Sergas, Vigo 36213, Spain
- Microbiology and Parasitology Department, Medicine and Odontology, Universidade de Santiago, Santiago de Compostela 15782, Spain
| | - Jorge Julio Cabrera-Alvargonzález
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
- Department of Microbiology, Complexo Hospitalario Universitario de Vigo (CHUVI), Sergas, Vigo 36213, Spain
| | - Víctor del Campo
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO
- Department of Preventive Medicine, Complexo Hospitalario Universitario de Vigo (CHUVI), Sergas, Vigo 36213, Spain
| | | | | |
Collapse
|
19
|
Illingworth CJR, Hamilton WL, Warne B, Routledge M, Popay A, Jackson C, Fieldman T, Meredith LW, Houldcroft CJ, Hosmillo M, Jahun AS, Caller LG, Caddy SL, Yakovleva A, Hall G, Khokhar FA, Feltwell T, Pinckert ML, Georgana I, Chaudhry Y, Curran MD, Parmar S, Sparkes D, Rivett L, Jones NK, Sridhar S, Forrest S, Dymond T, Grainger K, Workman C, Ferris M, Gkrania-Klotsas E, Brown NM, Weekes MP, Baker S, Peacock SJ, Goodfellow IG, Gouliouris T, de Angelis D, Török ME. Superspreaders drive the largest outbreaks of hospital onset COVID-19 infections. eLife 2021; 10:e67308. [PMID: 34425938 PMCID: PMC8384420 DOI: 10.7554/elife.67308] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Accepted: 07/15/2021] [Indexed: 12/11/2022] Open
Abstract
SARS-CoV-2 is notable both for its rapid spread, and for the heterogeneity of its patterns of transmission, with multiple published incidences of superspreading behaviour. Here, we applied a novel network reconstruction algorithm to infer patterns of viral transmission occurring between patients and health care workers (HCWs) in the largest clusters of COVID-19 infection identified during the first wave of the epidemic at Cambridge University Hospitals NHS Foundation Trust, UK. Based upon dates of individuals reporting symptoms, recorded individual locations, and viral genome sequence data, we show an uneven pattern of transmission between individuals, with patients being much more likely to be infected by other patients than by HCWs. Further, the data were consistent with a pattern of superspreading, whereby 21% of individuals caused 80% of transmission events. Our study provides a detailed retrospective analysis of nosocomial SARS-CoV-2 transmission, and sheds light on the need for intensive and pervasive infection control procedures.
Collapse
Affiliation(s)
- Christopher JR Illingworth
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson WayCambridgeUnited Kingdom
- Institut für Biologische Physik, Universität zu KölnKölnGermany
- Department of Applied Mathematics and Theoretical Physics, Centre for Mathematical SciencesCambridgeUnited States
| | - William L Hamilton
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Ben Warne
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Matthew Routledge
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Public Health England Clinical Microbiology and Public Health Laboratory, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Ashley Popay
- Public Health England Field Epidemiology Unit, Cambridge Institute of Public Health, Forvie Site, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Chris Jackson
- MRC Biostatistics Unit, University of Cambridge, East Forvie Building, Forvie Site, Robinson WayCambridgeUnited Kingdom
| | - Tom Fieldman
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Luke W Meredith
- University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Charlotte J Houldcroft
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Myra Hosmillo
- University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Aminu S Jahun
- University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Laura G Caller
- University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Sarah L Caddy
- Cambridge Institute for Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical CentreCambridgeUnited Kingdom
| | - Anna Yakovleva
- University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Grant Hall
- University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Fahad A Khokhar
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Cambridge Institute for Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical CentreCambridgeUnited Kingdom
| | - Theresa Feltwell
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Malte L Pinckert
- University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Iliana Georgana
- University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Yasmin Chaudhry
- University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Martin D Curran
- Public Health England Clinical Microbiology and Public Health Laboratory, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Surendra Parmar
- Public Health England Clinical Microbiology and Public Health Laboratory, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Dominic Sparkes
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Public Health England Clinical Microbiology and Public Health Laboratory, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Lucy Rivett
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Public Health England Clinical Microbiology and Public Health Laboratory, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Nick K Jones
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Public Health England Clinical Microbiology and Public Health Laboratory, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Sushmita Sridhar
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Cambridge Institute for Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical CentreCambridgeUnited Kingdom
- Wellcome Sanger Institute, Wellcome Trust Genome CampusHinxtonUnited Kingdom
| | - Sally Forrest
- Cambridge Institute for Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical CentreCambridgeUnited Kingdom
| | - Tom Dymond
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Kayleigh Grainger
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Chris Workman
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Mark Ferris
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Effrossyni Gkrania-Klotsas
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
- MRC Epidemiology Unit, University of Cambridge, Level 3 Institute of Metabolic ScienceCambridgeUnited Kingdom
- University of Cambridge, School of Clinical Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Nicholas M Brown
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Michael P Weekes
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Cambridge Institute for Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical CentreCambridgeUnited Kingdom
| | - Stephen Baker
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Cambridge Institute for Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical CentreCambridgeUnited Kingdom
| | - Sharon J Peacock
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Wellcome Sanger Institute, Wellcome Trust Genome CampusHinxtonUnited Kingdom
- Public Health England, National Infection ServiceLondonUnited Kingdom
| | - Ian G Goodfellow
- University of Cambridge, Department of Pathology, Division of Virology, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Theodore Gouliouris
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Public Health England Clinical Microbiology and Public Health Laboratory, Cambridge Biomedical CampusCambridgeUnited Kingdom
| | - Daniela de Angelis
- Institut für Biologische Physik, Universität zu KölnKölnGermany
- Public Health England, National Infection ServiceLondonUnited Kingdom
| | - M Estée Török
- University of Cambridge, Department of Medicine, Cambridge Biomedical CampusCambridgeUnited Kingdom
- Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical CampusCambridgeUnited Kingdom
| |
Collapse
|
20
|
Tonkin-Hill G, Martincorena I, Amato R, Lawson ARJ, Gerstung M, Johnston I, Jackson DK, Park N, Lensing SV, Quail MA, Gonçalves S, Ariani C, Spencer Chapman M, Hamilton WL, Meredith LW, Hall G, Jahun AS, Chaudhry Y, Hosmillo M, Pinckert ML, Georgana I, Yakovleva A, Caller LG, Caddy SL, Feltwell T, Khokhar FA, Houldcroft CJ, Curran MD, Parmar S, Alderton A, Nelson R, Harrison EM, Sillitoe J, Bentley SD, Barrett JC, Torok ME, Goodfellow IG, Langford C, Kwiatkowski D. Patterns of within-host genetic diversity in SARS-CoV-2. eLife 2021; 10:e66857. [PMID: 34387545 PMCID: PMC8363274 DOI: 10.7554/elife.66857] [Citation(s) in RCA: 97] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 07/22/2021] [Indexed: 12/15/2022] Open
Abstract
Monitoring the spread of SARS-CoV-2 and reconstructing transmission chains has become a major public health focus for many governments around the world. The modest mutation rate and rapid transmission of SARS-CoV-2 prevents the reconstruction of transmission chains from consensus genome sequences, but within-host genetic diversity could theoretically help identify close contacts. Here we describe the patterns of within-host diversity in 1181 SARS-CoV-2 samples sequenced to high depth in duplicate. 95.1% of samples show within-host mutations at detectable allele frequencies. Analyses of the mutational spectra revealed strong strand asymmetries suggestive of damage or RNA editing of the plus strand, rather than replication errors, dominating the accumulation of mutations during the SARS-CoV-2 pandemic. Within- and between-host diversity show strong purifying selection, particularly against nonsense mutations. Recurrent within-host mutations, many of which coincide with known phylogenetic homoplasies, display a spectrum and patterns of purifying selection more suggestive of mutational hotspots than recombination or convergent evolution. While allele frequencies suggest that most samples result from infection by a single lineage, we identify multiple putative examples of co-infection. Integrating these results into an epidemiological inference framework, we find that while sharing of within-host variants between samples could help the reconstruction of transmission chains, mutational hotspots and rare cases of superinfection can confound these analyses.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Naomi Park
- Wellcome Sanger InstituteHinxtonUnited Kingdom
| | | | | | | | | | | | | | - Luke W Meredith
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | - Grant Hall
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | - Aminu S Jahun
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | - Yasmin Chaudhry
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | - Myra Hosmillo
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | - Malte L Pinckert
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | - Iliana Georgana
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | - Anna Yakovleva
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | - Laura G Caller
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | - Sarah L Caddy
- Department of Medicine, University of CambridgeCambridgeUnited Kingdom
| | - Theresa Feltwell
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | - Fahad A Khokhar
- Department of Medicine, University of CambridgeCambridgeUnited Kingdom
- Cambridge Institute of Therapeutic Immunology and Infectious Disease, University of CambridgeCambridgeUnited Kingdom
| | | | | | | | | | | | | | - Ewan M Harrison
- Wellcome Sanger InstituteHinxtonUnited Kingdom
- European Bioinformatics InstituteHinxtonUnited Kingdom
| | | | | | | | - M Estee Torok
- Department of Medicine, University of CambridgeCambridgeUnited Kingdom
| | - Ian G Goodfellow
- Department of Pathology, University of CambridgeCambridgeUnited Kingdom
| | | | - Dominic Kwiatkowski
- Wellcome Sanger InstituteHinxtonUnited Kingdom
- Nuffield Department of Medicine, University of OxfordOxfordUnited Kingdom
| | | |
Collapse
|
21
|
Didelot X, Kendall M, Xu Y, White PJ, McCarthy N. Genomic Epidemiology Analysis of Infectious Disease Outbreaks Using TransPhylo. Curr Protoc 2021; 1:e60. [PMID: 33617114 PMCID: PMC7995038 DOI: 10.1002/cpz1.60] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Comparing the pathogen genomes from several cases of an infectious disease has the potential to help us understand and control outbreaks. Many methods exist to reconstruct a phylogeny from such genomes, which represents how the genomes are related to one another. However, such a phylogeny is not directly informative about transmission events between individuals. TransPhylo is a software tool implemented as an R package designed to bridge the gap between pathogen phylogenies and transmission trees. TransPhylo is based on a combined model of transmission between hosts and pathogen evolution within each host. It can simulate both phylogenies and transmission trees jointly under this combined model. TransPhylo can also reconstruct a transmission tree based on a dated phylogeny, by exploring the space of transmission trees compatible with the phylogeny. A transmission tree can be represented as a coloring of a phylogeny where each color represents a different host of the pathogen, and TransPhylo provides convenient ways to plot these colorings and explore the results. This article presents the basic protocols that can be used to make the most of TransPhylo. © 2021 The Authors. Basic Protocol 1: First steps with TransPhylo Basic Protocol 2: Simulation of outbreak data Basic Protocol 3: Inference of transmission Basic Protocol 4: Exploring the results of inference.
Collapse
Affiliation(s)
- Xavier Didelot
- School of Life Sciences and Department of StatisticsUniversity of WarwickUnited Kingdom
| | - Michelle Kendall
- School of Life Sciences and Department of StatisticsUniversity of WarwickUnited Kingdom
| | - Yuanwei Xu
- Center for Computational Biology, Institute of Cancer and Genomic SciencesUniversity of BirminghamUnited Kingdom
| | - Peter J. White
- Department of Infectious Disease Epidemiology, School of Public HealthImperial College LondonUnited Kingdom
- Medical Research Council Centre for Global Infectious Disease Analysis, School of Public HealthImperial College LondonUnited Kingdom
- National Institute for Health Research Health Protection Research Unit in Modelling and Health Economics, School of Public HealthImperial College LondonUnited Kingdom
- Modelling and Economics Unit, National Infection ServicePublic Health EnglandLondonUnited Kingdom
| | - Noel McCarthy
- Warwick Medical SchoolUniversity of WarwickUnited Kingdom
| |
Collapse
|
22
|
Watson OJ, Okell LC, Hellewell J, Slater HC, Unwin HJT, Omedo I, Bejon P, Snow RW, Noor AM, Rockett K, Hubbart C, Nankabirwa JI, Greenhouse B, Chang HH, Ghani AC, Verity R. Evaluating the Performance of Malaria Genetics for Inferring Changes in Transmission Intensity Using Transmission Modeling. Mol Biol Evol 2021; 38:274-289. [PMID: 32898225 PMCID: PMC7783189 DOI: 10.1093/molbev/msaa225] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Substantial progress has been made globally to control malaria, however there is a growing need for innovative new tools to ensure continued progress. One approach is to harness genetic sequencing and accompanying methodological approaches as have been used in the control of other infectious diseases. However, to utilize these methodologies for malaria, we first need to extend the methods to capture the complex interactions between parasites, human and vector hosts, and environment, which all impact the level of genetic diversity and relatedness of malaria parasites. We develop an individual-based transmission model to simulate malaria parasite genetics parameterized using estimated relationships between complexity of infection and age from five regions in Uganda and Kenya. We predict that cotransmission and superinfection contribute equally to within-host parasite genetic diversity at 11.5% PCR prevalence, above which superinfections dominate. Finally, we characterize the predictive power of six metrics of parasite genetics for detecting changes in transmission intensity, before grouping them in an ensemble statistical model. The model predicted malaria prevalence with a mean absolute error of 0.055. Different assumptions about the availability of sample metadata were considered, with the most accurate predictions of malaria prevalence made when the clinical status and age of sampled individuals is known. Parasite genetics may provide a novel surveillance tool for estimating the prevalence of malaria in areas in which prevalence surveys are not feasible. However, the findings presented here reinforce the need for patient metadata to be recorded and made available within all future attempts to use parasite genetics for surveillance.
Collapse
Affiliation(s)
- Oliver J Watson
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Lucy C Okell
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Joel Hellewell
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Hannah C Slater
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - H Juliette T Unwin
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Irene Omedo
- KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast, Kilifi, Kenya
| | - Philip Bejon
- KEMRI-Wellcome Trust Research Programme, Centre for Geographic Medicine Research-Coast, Kilifi, Kenya
| | - Robert W Snow
- Population Health Unit, Kenya Medical Research Institute—Wellcome Trust Research Programme, Nairobi, Kenya
- Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, United Kingdom
| | | | - Kirk Rockett
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Christina Hubbart
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Joaniter I Nankabirwa
- Infectious Diseases Research Collaboration, Kampala, Uganda
- Makerere University College of Health Sciences, Kampala, Uganda
| | - Bryan Greenhouse
- Department of Medicine, University of California, San Francisco, San Francisco, CA
| | - Hsiao-Han Chang
- Center for Communicable Disease Dynamics, Harvard TH Chan School of Public Health, Boston, MA
| | - Azra C Ghani
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Robert Verity
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| |
Collapse
|
23
|
Eyre DW, Laager M, Walker AS, Cooper BS, Wilson DJ. Probabilistic transmission models incorporating sequencing data for healthcare-associated Clostridioides difficile outperform heuristic rules and identify strain-specific differences in transmission. PLoS Comput Biol 2021; 17:e1008417. [PMID: 33444378 PMCID: PMC7840057 DOI: 10.1371/journal.pcbi.1008417] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 01/27/2021] [Accepted: 10/05/2020] [Indexed: 12/28/2022] Open
Abstract
Fitting stochastic transmission models to electronic patient data can offer detailed insights into the transmission of healthcare-associated infections and improve infection control. Pathogen whole-genome sequencing may improve the precision of model inferences, but computational constraints have limited modelling applications predominantly to small datasets and specific outbreaks, whereas large-scale sequencing studies have mostly relied on simple rules for identifying/excluding plausible transmission. We present a novel approach for integrating detailed epidemiological data on patient contact networks in hospitals with large-scale pathogen sequencing data. We apply our approach to study Clostridioides difficile transmission using a dataset of 1223 infections in Oxfordshire, UK, 2007-2011. 262 (21% [95% credibility interval 20-22%]) infections were estimated to have been acquired from another known case. There was heterogeneity by sequence type (ST) in the proportion of cases acquired from another case with the highest rates in ST1 (ribotype-027), ST42 (ribotype-106) and ST3 (ribotype-001). These same STs also had higher rates of transmission mediated via environmental contamination/spores persisting after patient discharge/recovery; for ST1 these persisted longer than for most other STs except ST3 and ST42. We also identified variation in transmission between hospitals, medical specialties and over time; by 2011 nearly all transmission from known cases had ceased in our hospitals. Our findings support previous work suggesting only a minority of C. difficile infections are acquired from known cases but highlight a greater role for environmental contamination than previously thought. Our approach is applicable to other healthcare-associated infections. Our findings have important implications for effective control of C. difficile.
Collapse
Affiliation(s)
- David W. Eyre
- Big Data Institute, Nuffield Department of Population Health, University of Oxford, United Kingdom
- Nuffield Department of Medicine, University of Oxford, United Kingdom
- Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, United Kingdom
| | - Mirjam Laager
- Nuffield Department of Medicine, University of Oxford, United Kingdom
| | - A. Sarah Walker
- Nuffield Department of Medicine, University of Oxford, United Kingdom
- Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, University of Oxford, United Kingdom
| | - Ben S. Cooper
- Nuffield Department of Medicine, University of Oxford, United Kingdom
| | - Daniel J. Wilson
- Big Data Institute, Nuffield Department of Population Health, University of Oxford, United Kingdom
| | | |
Collapse
|
24
|
Montazeri H, Little S, Legha MM, Beerenwinkel N, DeGruttola V. Bayesian reconstruction of transmission trees from genetic sequences and uncertain infection times. Stat Appl Genet Mol Biol 2020; 19:/j/sagmb.ahead-of-print/sagmb-2019-0026/sagmb-2019-0026.xml. [PMID: 33085643 PMCID: PMC8212962 DOI: 10.1515/sagmb-2019-0026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 09/16/2020] [Indexed: 11/15/2022]
Abstract
Genetic sequence data of pathogens are increasingly used to investigate transmission dynamics in both endemic diseases and disease outbreaks. Such research can aid in the development of appropriate interventions and in the design of studies to evaluate them. Several computational methods have been proposed to infer transmission chains from sequence data; however, existing methods do not generally reliably reconstruct transmission trees because genetic sequence data or inferred phylogenetic trees from such data contain insufficient information for accurate estimation of transmission chains. Here, we show by simulation studies that incorporating infection times, even when they are uncertain, can greatly improve the accuracy of reconstruction of transmission trees. To achieve this improvement, we propose a Bayesian inference methods using Markov chain Monte Carlo that directly draws samples from the space of transmission trees under the assumption of complete sampling of the outbreak. The likelihood of each transmission tree is computed by a phylogenetic model by treating its internal nodes as transmission events. By a simulation study, we demonstrate that accuracy of the reconstructed transmission trees depends mainly on the amount of information available on times of infection; we show superiority of the proposed method to two alternative approaches when infection times are known up to specified degrees of certainty. In addition, we illustrate the use of a multiple imputation framework to study features of epidemic dynamics, such as the relationship between characteristics of nodes and average number of outbound edges or inbound edges, signifying possible transmission events from and to nodes. We apply the proposed method to a transmission cluster in San Diego and to a dataset from the 2014 Sierra Leone Ebola virus outbreak and investigate the impact of biological, behavioral, and demographic factors.
Collapse
Affiliation(s)
- Hesam Montazeri
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Susan Little
- Department of Medicine, University of California San Diego, California, USA
| | - Mozhgan Mozaffari Legha
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | | |
Collapse
|
25
|
Boskova V, Stadler T. PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences. Mol Biol Evol 2020; 37:3061-3075. [PMID: 32492139 PMCID: PMC7530608 DOI: 10.1093/molbev/msaa136] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.
Collapse
Affiliation(s)
- Veronika Boskova
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Switzerland
- Center for Integrative Bioinformatics Vienna, Max Perutz Labs, University of Vienna and Medical University of Vienna, Vienna, Austria
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics (SIB), Switzerland
| |
Collapse
|
26
|
What Should Health Departments Do with HIV Sequence Data? Viruses 2020; 12:v12091018. [PMID: 32932642 PMCID: PMC7551807 DOI: 10.3390/v12091018] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 09/09/2020] [Accepted: 09/11/2020] [Indexed: 11/27/2022] Open
Abstract
Many countries and US states have mandatory statues that require reporting of HIV clinical data including genetic sequencing results to the public health departments. Because genetic sequencing is a part of routine care for HIV infected persons, health departments have extensive sequence collections spanning years and even decades of the HIV epidemic. How should these data be used (or not) in public health practice? This is a complex, multi-faceted question that weighs personal risks against public health benefit. The answer is neither straightforward nor universal. However, to make that judgement—of how genetic sequence data should be used in describing and combating the HIV epidemic—we need a clear image of what a phylogenetically enhanced HIV surveillance system can do and what benefit it might provide. In this paper, we present a positive case for how up-to-date analysis of HIV sequence databases managed by health departments can provide unique and actionable information of how HIV is spreading in local communities. We discuss this question broadly, with examples from the US, as it is globally relevant for all health authorities that collect HIV genetic data.
Collapse
|
27
|
Firestone SM, Hayama Y, Lau MSY, Yamamoto T, Nishi T, Bradhurst RA, Demirhan H, Stevenson MA, Tsutsui T. Transmission network reconstruction for foot-and-mouth disease outbreaks incorporating farm-level covariates. PLoS One 2020; 15:e0235660. [PMID: 32667952 PMCID: PMC7363093 DOI: 10.1371/journal.pone.0235660] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 06/22/2020] [Indexed: 11/19/2022] Open
Abstract
Transmission network modelling to infer 'who infected whom' in infectious disease outbreaks is a highly active area of research. Outbreaks of foot-and-mouth disease have been a key focus of transmission network models that integrate genomic and epidemiological data. The aim of this study was to extend Lau's systematic Bayesian inference framework to incorporate additional parameters representing predominant species and numbers of animals held on a farm. Lau's Bayesian Markov chain Monte Carlo algorithm was reformulated, verified and pseudo-validated on 100 simulated outbreaks populated with demographic data Japan and Australia. The modified model was then implemented on genomic and epidemiological data from the 2010 outbreak of foot-and-mouth disease in Japan, and outputs compared to those from the SCOTTI model implemented in BEAST2. The modified model achieved improvements in overall accuracy when tested on the simulated outbreaks. When implemented on the actual outbreak data from Japan, infected farms that held predominantly pigs were estimated to have five times the transmissibility of infected cattle farms and be 49% less susceptible. The farm-level incubation period was 1 day shorter than the latent period, the timing of the seeding of the outbreak in Japan was inferred, as were key linkages between clusters and features of farms involved in widespread dissemination of this outbreak. To improve accessibility the modified model has been implemented as the R package 'BORIS' for use in future outbreaks.
Collapse
Affiliation(s)
- Simon M. Firestone
- Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Yoko Hayama
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Ibaraki, Japan
| | - Max S. Y. Lau
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, United States of America
| | - Takehisa Yamamoto
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Ibaraki, Japan
| | - Tatsuya Nishi
- Exotic Disease Research Station, National Institute of Animal Health, National Agriculture and Food Research Organization, Kodaira, Tokyo, Japan
| | - Richard A. Bradhurst
- Centre of Excellence for Biosecurity Risk Analysis, The University of Melbourne, Parkville, VIC, Australia
| | - Haydar Demirhan
- Mathematical Sciences Discipline, School of Science, RMIT University, Melbourne, VIC, Australia
| | - Mark A. Stevenson
- Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, Victoria, Australia
| | - Toshiyuki Tsutsui
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Ibaraki, Japan
| |
Collapse
|
28
|
Abstract
MOTIVATION The combination of genomic and epidemiological data holds the potential to enable accurate pathogen transmission history inference. However, the inference of outbreak transmission histories remains challenging due to various factors such as within-host pathogen diversity and multi-strain infections. Current computational methods ignore within-host diversity and/or multi-strain infections, often failing to accurately infer the transmission history. Thus, there is a need for efficient computational methods for transmission tree inference that accommodate the complexities of real data. RESULTS We formulate the direct transmission inference (DTI) problem for inferring transmission trees that support multi-strain infections given a timed phylogeny and additional epidemiological data. We establish hardness for the decision and counting version of the DTI problem. We introduce Transmission Tree Uniform Sampler (TiTUS), a method that uses SATISFIABILITY to almost uniformly sample from the space of transmission trees. We introduce criteria that prioritize parsimonious transmission trees that we subsequently summarize using a novel consensus tree approach. We demonstrate TiTUS's ability to accurately reconstruct transmission trees on simulated data as well as a documented HIV transmission chain. AVAILABILITY AND IMPLEMENTATION https://github.com/elkebir-group/TiTUS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Palash Sashittal
- Department of Aerospace Engineering, University of Illinois at Urbana-Champaign, Urbama, IL 61801, USA
| | - Mohammed El-Kebir
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbama, IL 61801, USA
| |
Collapse
|
29
|
Alamil M, Hughes J, Berthier K, Desbiez C, Thébaud G, Soubeyrand S. Inferring epidemiological links from deep sequencing data: a statistical learning approach for human, animal and plant diseases. Philos Trans R Soc Lond B Biol Sci 2020; 374:20180258. [PMID: 31056055 PMCID: PMC6553606 DOI: 10.1098/rstb.2018.0258] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Pathogen sequence data have been exploited to infer who infected whom, by using empirical and model-based approaches. Most of these approaches exploit one pathogen sequence per infected host (e.g. individual, household, field). However, modern sequencing techniques can reveal the polymorphic nature of within-host populations of pathogens. Thus, these techniques provide a subsample of the pathogen variants that were present in the host at the sampling time. Such data are expected to give more insight on epidemiological links than a single sequence per host. In general, a mechanistic viewpoint to transmission and micro-evolution has been followed to infer epidemiological links from these data. Here, we investigate an alternative approach grounded on statistical learning. The idea consists of learning the structure of epidemiological links with a pseudo-evolutionary model applied to training data obtained from contact tracing, for example, and using this initial stage to infer links for the whole dataset. Such an approach has the potential to be particularly valuable in the case of a risk of erroneous mechanistic assumptions, it is sufficiently parsimonious to allow the handling of big datasets in the future, and it is versatile enough to be applied to very different contexts from animal, human and plant epidemiology. This article is part of the theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes’. This issue is linked with the subsequent theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control’.
Collapse
Affiliation(s)
- M Alamil
- 1 BioSP, INRA, 84914 Avignon , France
| | - J Hughes
- 2 MRC-University of Glasgow Centre for Virus Research , Glasgow G61 1QH , UK
| | - K Berthier
- 3 Pathologie Végétale, INRA , 84140 Montfavet , France
| | - C Desbiez
- 3 Pathologie Végétale, INRA , 84140 Montfavet , France
| | - G Thébaud
- 4 BGPI, INRA, Univ. Montpellier , SupAgro, Cirad, 34398 Montpellier , France
| | | |
Collapse
|
30
|
Abstract
In 1918, a strain of influenza A virus caused a human pandemic resulting in the deaths of 50 million people. A century later, with the advent of sequencing technology and corresponding phylogenetic methods, we know much more about the origins, evolution and epidemiology of influenza epidemics. Here we review the history of avian influenza viruses through the lens of their genetic makeup: from their relationship to human pandemic viruses, starting with the 1918 H1N1 strain, through to the highly pathogenic epidemics in birds and zoonoses up to 2018. We describe the genesis of novel influenza A virus strains by reassortment and evolution in wild and domestic bird populations, as well as the role of wild bird migration in their long-range spread. The emergence of highly pathogenic avian influenza viruses, and the zoonotic incursions of avian H5 and H7 viruses into humans over the last couple of decades are also described. The threat of a new avian influenza virus causing a human pandemic is still present today, although control in domestic avian populations can minimize the risk to human health. This article is part of the theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes’. This issue is linked with the subsequent theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control’.
Collapse
Affiliation(s)
| | | | - Paul Digard
- The Roslin Institute, University of Edinburgh , Edinburgh , UK
| |
Collapse
|
31
|
Xu Y, Cancino-Muñoz I, Torres-Puente M, Villamayor LM, Borrás R, Borrás-Máñez M, Bosque M, Camarena JJ, Colomer-Roig E, Colomina J, Escribano I, Esparcia-Rodríguez O, Gil-Brusola A, Gimeno C, Gimeno-Gascón A, Gomila-Sard B, González-Granda D, Gonzalo-Jiménez N, Guna-Serrano MR, López-Hontangas JL, Martín-González C, Moreno-Muñoz R, Navarro D, Navarro M, Orta N, Pérez E, Prat J, Rodríguez JC, Ruiz-García MM, Vanaclocha H, Colijn C, Comas I. High-resolution mapping of tuberculosis transmission: Whole genome sequencing and phylogenetic modelling of a cohort from Valencia Region, Spain. PLoS Med 2019; 16:e1002961. [PMID: 31671150 PMCID: PMC6822721 DOI: 10.1371/journal.pmed.1002961] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 10/07/2019] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Whole genome sequencing provides better delineation of transmission clusters in Mycobacterium tuberculosis than traditional methods. However, its ability to reveal individual transmission links within clusters is limited. Here, we used a 2-step approach based on Bayesian transmission reconstruction to (1) identify likely index and missing cases, (2) determine risk factors associated with transmitters, and (3) estimate when transmission happened. METHODS AND FINDINGS We developed our transmission reconstruction method using genomic and epidemiological data from a population-based study from Valencia Region, Spain. Tuberculosis (TB) incidence during the study period was 8.4 cases per 100,000 people. While the study is ongoing, the sampling frame for this work includes notified TB cases between 1 January 2014 and 31 December 2016. We identified a total of 21 transmission clusters that fulfilled the criteria for analysis. These contained a total of 117 individuals diagnosed with active TB (109 with epidemiological data). Demographic characteristics of the study population were as follows: 80/109 (73%) individuals were Spanish-born, 76/109 (70%) individuals were men, and the mean age was 42.51 years (SD 18.46). We found that 66/109 (61%) TB patients were sputum positive at diagnosis, and 10/109 (9%) were HIV positive. We used the data to reveal individual transmission links, and to identify index cases, missing cases, likely transmitters, and associated transmission risk factors. Our Bayesian inference approach suggests that at least 60% of index cases are likely misidentified by local public health. Our data also suggest that factors associated with likely transmitters are different to those of simply being in a transmission cluster, highlighting the importance of differentiating between these 2 phenomena. Our data suggest that type 2 diabetes mellitus is a risk factor associated with being a transmitter (odds ratio 0.19 [95% CI 0.02-1.10], p < 0.003). Finally, we used the most likely timing for transmission events to study when TB transmission occurred; we identified that 5/14 (35.7%) cases likely transmitted TB well before symptom onset, and these were largely sputum negative at diagnosis. Limited within-cluster diversity does not allow us to extrapolate our findings to the whole TB population in Valencia Region. CONCLUSIONS In this study, we found that index cases are often misidentified, with downstream consequences for epidemiological investigations because likely transmitters can be missed. Our findings regarding inferred transmission timing suggest that TB transmission can occur before patient symptom onset, suggesting also that TB transmits during sub-clinical disease. This result has direct implications for diagnosing TB and reducing transmission. Overall, we show that a transition to individual-based genomic epidemiology will likely close some of the knowledge gaps in TB transmission and may redirect efforts towards cost-effective contact investigations for improved TB control.
Collapse
Affiliation(s)
- Yuanwei Xu
- Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, United Kingdom
| | - Irving Cancino-Muñoz
- Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Valencia, Spain
| | - Manuela Torres-Puente
- Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Valencia, Spain
| | | | - Rafael Borrás
- Microbiology Service, Hospital Clínico Universitario, Valencia, Spain
| | - María Borrás-Máñez
- Microbiology and Parasitology Service, Hospital Universitario de La Ribera, Alzira, Spain
| | | | - Juan J. Camarena
- Microbiology Service, Hospital Universitario Dr. Peset, Valencia, Spain
| | - Ester Colomer-Roig
- Genomics and Health Unit, FISABIO Public Health, Valencia, Spain
- Microbiology Service, Hospital Universitario Dr. Peset, Valencia, Spain
| | - Javier Colomina
- Microbiology and Parasitology Service, Hospital Universitario de La Ribera, Alzira, Spain
| | - Isabel Escribano
- Microbiology Laboratory, Hospital Virgen de los Lírios, Alcoy, Spain
| | | | - Ana Gil-Brusola
- Microbiology Service, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Concepción Gimeno
- Microbiology Service, Hospital General Universitario de Valencia, Valencia, Spain
| | | | - Bárbara Gomila-Sard
- Microbiology Service, Hospital General Universitario de Castellón, Castellon, Spain
| | | | | | | | | | - Coral Martín-González
- Microbiology Service, Hospital Universitario de San Juan de Alicante, Alicante, Spain
| | - Rosario Moreno-Muñoz
- Microbiology Service, Hospital General Universitario de Castellón, Castellon, Spain
| | - David Navarro
- Microbiology Service, Hospital Clínico Universitario, Valencia, Spain
| | - María Navarro
- Microbiology Service, Hospital de la Vega Baixa, Orihuela, Spain
| | - Nieves Orta
- Microbiology Service, Hospital San Francesc de Borja, Gandía, Spain
| | - Elvira Pérez
- Subdirección General de Epidemiología y Vigilancia de la Salud, Dirección General de Salud Pública, Valencia, Spain
| | - Josep Prat
- Microbiology Service, Hospital de Sagunto, Sagunto, Spain
| | | | | | - Herme Vanaclocha
- Subdirección General de Epidemiología y Vigilancia de la Salud, Dirección General de Salud Pública, Valencia, Spain
| | - Caroline Colijn
- Centre for Mathematics of Precision Healthcare, Department of Mathematics, Imperial College London, London, United Kingdom
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
- * E-mail: (CC); (IC)
| | - Iñaki Comas
- Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas, Valencia, Spain
- * E-mail: (CC); (IC)
| |
Collapse
|
32
|
Theys K, Lemey P, Vandamme AM, Baele G. Advances in Visualization Tools for Phylogenomic and Phylodynamic Studies of Viral Diseases. Front Public Health 2019; 7:208. [PMID: 31428595 PMCID: PMC6688121 DOI: 10.3389/fpubh.2019.00208] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2019] [Accepted: 07/12/2019] [Indexed: 01/28/2023] Open
Abstract
Genomic and epidemiological monitoring have become an integral part of our response to emerging and ongoing epidemics of viral infectious diseases. Advances in high-throughput sequencing, including portable genomic sequencing at reduced costs and turnaround time, are paralleled by continuing developments in methodology to infer evolutionary histories (dynamics/patterns) and to identify factors driving viral spread in space and time. The traditionally static nature of visualizing phylogenetic trees that represent these evolutionary relationships/processes has also evolved, albeit perhaps at a slower rate. Advanced visualization tools with increased resolution assist in drawing conclusions from phylogenetic estimates and may even have potential to better inform public health and treatment decisions, but the design (and choice of what analyses are shown) is hindered by the complexity of information embedded within current phylogenetic models and the integration of available meta-data. In this review, we discuss visualization challenges for the interpretation and exploration of reconstructed histories of viral epidemics that arose from increasing volumes of sequence data and the wealth of additional data layers that can be integrated. We focus on solutions that address joint temporal and spatial visualization but also consider what the future may bring in terms of visualization and how this may become of value for the coming era of real-time digital pathogen surveillance, where actionable results and adequate intervention strategies need to be obtained within days.
Collapse
Affiliation(s)
- Kristof Theys
- Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Clinical and Epidemiological Virology, KU Leuven, Leuven, Belgium
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Clinical and Epidemiological Virology, KU Leuven, Leuven, Belgium
| | - Anne-Mieke Vandamme
- Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Clinical and Epidemiological Virology, KU Leuven, Leuven, Belgium
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Clinical and Epidemiological Virology, KU Leuven, Leuven, Belgium
| |
Collapse
|
33
|
Abstract
One approach to the reconstruction of infectious disease transmission trees from pathogen genomic data has been to use a phylogenetic tree, reconstructed from pathogen sequences, and annotate its internal nodes to provide a reconstruction of which host each lineage was in at each point in time. If only one pathogen lineage can be transmitted to a new host (i.e., the transmission bottleneck is complete), this corresponds to partitioning the nodes of the phylogeny into connected regions, each of which represents evolution in an individual host. These partitions define the possible transmission trees that are consistent with a given phylogenetic tree. However, the mathematical properties of the transmission trees given a phylogeny remain largely unexplored. Here, we describe a procedure to calculate the number of possible transmission trees for a given phylogeny, and we then show how to uniformly sample from these transmission trees. The procedure is outlined for situations where one sample is available from each host and trees do not have branch lengths, and we also provide extensions for incomplete sampling, multiple sampling, and the application to time trees in a situation where limits on the period during which each host could have been infected and infectious are known. The sampling algorithm is available as an R package (STraTUS).
Collapse
Affiliation(s)
- Matthew D Hall
- Nuffield Department of Medicine, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
34
|
Chaters GL, Johnson PCD, Cleaveland S, Crispell J, de Glanville WA, Doherty T, Matthews L, Mohr S, Nyasebwa OM, Rossi G, Salvador LCM, Swai E, Kao RR. Analysing livestock network data for infectious disease control: an argument for routine data collection in emerging economies. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180264. [PMID: 31104601 PMCID: PMC6558568 DOI: 10.1098/rstb.2018.0264] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/13/2019] [Indexed: 11/12/2022] Open
Abstract
Livestock movements are an important mechanism of infectious disease transmission. Where these are well recorded, network analysis tools have been used to successfully identify system properties, highlight vulnerabilities to transmission, and inform targeted surveillance and control. Here we highlight the main uses of network properties in understanding livestock disease epidemiology and discuss statistical approaches to infer network characteristics from biased or fragmented datasets. We use a 'hurdle model' approach that predicts (i) the probability of movement and (ii) the number of livestock moved to generate synthetic 'complete' networks of movements between administrative wards, exploiting routinely collected government movement permit data from northern Tanzania. We demonstrate that this model captures a significant amount of the observed variation. Combining the cattle movement network with a spatial between-ward contact layer, we create a multiplex, over which we simulated the spread of 'fast' ( R0 = 3) and 'slow' ( R0 = 1.5) pathogens, and assess the effects of random versus targeted disease control interventions (vaccination and movement ban). The targeted interventions substantially outperform those randomly implemented for both fast and slow pathogens. Our findings provide motivation to encourage routine collection and centralization of movement data to construct representative networks. This article is part of the theme issue 'Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control'. This theme issue is linked with the earlier issue 'Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes'.
Collapse
Affiliation(s)
- G. L. Chaters
- Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow G12 8QQ, UK
| | - P. C. D. Johnson
- Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow G12 8QQ, UK
| | - S. Cleaveland
- Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow G12 8QQ, UK
| | - J. Crispell
- School of Veterinary Medicine, University College Dublin, Dublin, Ireland
| | - W. A. de Glanville
- Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow G12 8QQ, UK
| | - T. Doherty
- Royal (Dick) School of Veterinary Studies and Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - L. Matthews
- Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow G12 8QQ, UK
| | - S. Mohr
- Boyd Orr Centre for Population and Ecosystem Health, Institute of Biodiversity, Animal Health and Comparative Medicine, University of Glasgow, Glasgow G12 8QQ, UK
| | - O. M. Nyasebwa
- Department of Veterinary Services, Ministry of Livestock and Fisheries, Nelson Mandela Road, Dar Es Salaam, Tanzania
| | - G. Rossi
- Royal (Dick) School of Veterinary Studies and Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - L. C. M. Salvador
- Royal (Dick) School of Veterinary Studies and Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
- Department of Infectious Diseases, University of Georgia, Athens, GA 30602, USA
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - E. Swai
- Department of Veterinary Services, Ministry of Livestock and Fisheries, Nelson Mandela Road, Dar Es Salaam, Tanzania
| | - R. R. Kao
- Royal (Dick) School of Veterinary Studies and Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| |
Collapse
|
35
|
Stimson J, Gardy J, Mathema B, Crudu V, Cohen T, Colijn C. Beyond the SNP Threshold: Identifying Outbreak Clusters Using Inferred Transmissions. Mol Biol Evol 2019; 36:587-603. [PMID: 30690464 DOI: 10.1093/molbev/msy242] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Whole-genome sequencing (WGS) is increasingly used to aid the understanding of pathogen transmission. A first step in analyzing WGS data is usually to define "transmission clusters," sets of cases that are potentially linked by direct transmission. This is often done by including two cases in the same cluster if they are separated by fewer single-nucleotide polymorphisms (SNPs) than a specified threshold. However, there is little agreement as to what an appropriate threshold should be. We propose a probabilistic alternative, suggesting that the key inferential target for transmission clusters is the number of transmissions separating cases. We characterize this by combining the number of SNP differences and the length of time over which those differences have accumulated, using information about case timing, molecular clock, and transmission processes. Our framework has the advantage of allowing for variable mutation rates across the genome and can incorporate other epidemiological data. We use two tuberculosis studies to illustrate the impact of our approach: with British Columbia data by using spatial divisions; with Republic of Moldova data by incorporating antibiotic resistance. Simulation results indicate that our transmission-based method is better in identifying direct transmissions than a SNP threshold, with dissimilarity between clusterings of on average 0.27 bits compared with 0.37 bits for the SNP-threshold method and 0.84 bits for randomly permuted data. These results show that it is likely to outperform the SNP-threshold method where clock rates are variable and sample collection times are spread out. We implement the method in the R package transcluster.
Collapse
Affiliation(s)
- James Stimson
- Department of Mathematics, Imperial College London, London, UK
| | - Jennifer Gardy
- British Columbia Centre for Disease Control, Communicable Disease Prevention and Control Services, Vancouver, Canada.,School of Population and Public Health, University of British Columbia, Vancouver, Canada
| | - Barun Mathema
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York, USA
| | - Valeriu Crudu
- Phthisiopneumology Institute, Chisinau, Republic of Moldova
| | - Ted Cohen
- Yale University School of Public Health, New Haven
| | - Caroline Colijn
- Department of Mathematics, Imperial College London, London, UK.,Department of Mathematics, Simon Fraser University, Vancouver, Canada
| |
Collapse
|
36
|
Abeler-Dörner L, Grabowski MK, Rambaut A, Pillay D, Fraser C. PANGEA-HIV 2: Phylogenetics And Networks for Generalised Epidemics in Africa. Curr Opin HIV AIDS 2019; 14:173-180. [PMID: 30946141 PMCID: PMC6629166 DOI: 10.1097/coh.0000000000000542] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
PURPOSE OF REVIEW The HIV epidemic in sub-Saharan Africa is far from being under control and the ambitious UNAIDS targets are unlikely to be met by 2020 as declines in per-capita incidence being largely offset by demographic trends. There is an increasing number of proven and specific HIV prevention tools, but little consensus on how best to deploy them. RECENT FINDINGS Traditionally, phylogenetics has been used in HIV research to reconstruct the history of the epidemic and date zoonotic infections, whereas more recent publications focus on HIV diversity and drug resistance. However, it is also the most powerful method of source attribution available for the study of HIV transmission. The PANGEA (Phylogenetics And Networks for Generalized Epidemics in Africa) consortium has generated over 18 000 NGS HIV sequences from five countries in sub-Saharan Africa. Using phylogenetic methods, we will identify characteristics of individuals or groups, which are most likely to be at risk of infection or at risk of infecting others. SUMMARY Combining phylogenetics, phylodynamics and epidemiology will allow PANGEA to highlight where prevention efforts should be focussed to reduce the HIV epidemic most effectively. To maximise the public health benefit of the data, PANGEA offers accreditation to external researchers, allowing them to access the data and join the consortium. We also welcome submissions of other HIV sequences from sub-Saharan Africa to the database.
Collapse
Affiliation(s)
- Lucie Abeler-Dörner
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Mary K. Grabowski
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Rakai Health Sciences Program, Baltimore, USA
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, UK
| | - Deenan Pillay
- Africa Health Research Institute, KwaZulu-Natal, South Africa
- Division of Infection and Immunity, University College London, London, UK
| | - Christophe Fraser
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
37
|
Gilbertson MLJ, Fountain-Jones NM, Craft ME. Incorporating genomic methods into contact networks to reveal new insights into animal behavior and infectious disease dynamics. BEHAVIOUR 2019; 155:759-791. [PMID: 31680698 DOI: 10.1163/1568539x-00003471] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Utilization of contact networks has provided opportunities for assessing the dynamic interplay between pathogen transmission and host behavior. Genomic techniques have, in their own right, provided new insight into complex questions in disease ecology, and the increasing accessibility of genomic approaches means more researchers may seek out these tools. The integration of network and genomic approaches provides opportunities to examine the interaction between behavior and pathogen transmission in new ways and with greater resolution. While a number of studies have begun to incorporate both contact network and genomic approaches, a great deal of work has yet to be done to better integrate these techniques. In this review, we give a broad overview of how network and genomic approaches have each been used to address questions regarding the interaction of social behavior and infectious disease, and then discuss current work and future horizons for the merging of these techniques.
Collapse
Affiliation(s)
- Marie L J Gilbertson
- Department of Veterinary Population Medicine, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Nicholas M Fountain-Jones
- Department of Veterinary Population Medicine, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Meggan E Craft
- Department of Veterinary Population Medicine, University of Minnesota, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
38
|
Firestone SM, Hayama Y, Bradhurst R, Yamamoto T, Tsutsui T, Stevenson MA. Reconstructing foot-and-mouth disease outbreaks: a methods comparison of transmission network models. Sci Rep 2019; 9:4809. [PMID: 30886211 PMCID: PMC6423326 DOI: 10.1038/s41598-019-41103-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Accepted: 02/28/2019] [Indexed: 12/22/2022] Open
Abstract
A number of transmission network models are available that combine genomic and epidemiological data to reconstruct networks of who infected whom during infectious disease outbreaks. For such models to reliably inform decision-making they must be transparently validated, robust, and capable of producing accurate predictions within the short data collection and inference timeframes typical of outbreak responses. A lack of transparent multi-model comparisons reduces confidence in the accuracy of transmission network model outputs, negatively impacting on their more widespread use as decision-support tools. We undertook a formal comparison of the performance of nine published transmission network models based on a set of foot-and-mouth disease outbreaks simulated in a previously free country, with corresponding simulated phylogenies and genomic samples from animals on infected premises. Of the transmission network models tested, Lau’s systematic Bayesian integration framework was found to be the most accurate for inferring the transmission network and timing of exposures, correctly identifying the source of 73% of the infected premises (with 91% accuracy for sources with model support >0.80). The Structured COalescent Transmission Tree Inference provided the most accurate inference of molecular clock rates. This validation study points to which models might be reliably used to reconstruct similar future outbreaks and how to interpret the outputs to inform control. Further research could involve extending the best-performing models to explicitly represent within-host diversity so they can handle next-generation sequencing data, incorporating additional animal and farm-level covariates and combining predictions using Ensemble methods and other approaches.
Collapse
Affiliation(s)
- Simon M Firestone
- Asia-Pacific Centre for Animal Health, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia.
| | - Yoko Hayama
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Ibaraki, 305-0856, Japan
| | - Richard Bradhurst
- Centre of Excellence for Biosecurity Risk Analysis, The University of Melbourne, Parkville, VIC, 3010, Australia
| | - Takehisa Yamamoto
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Ibaraki, 305-0856, Japan
| | - Toshiyuki Tsutsui
- Viral Disease and Epidemiology Research Division, National Institute of Animal Health, National Agriculture Research Organization, Tsukuba, Ibaraki, 305-0856, Japan
| | - Mark A Stevenson
- Asia-Pacific Centre for Animal Health, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
39
|
Abstract
OBJECTIVES Molecular epidemiology is applied to various aspects of HIV transmission analyses. With ultradeep sequencing (UDS), in-depth characterization of transmission episodes involving minority variants is permitted. We explored HIV-1 epidemiological linkage and evaluated characteristics of transmission dynamics and transmitted drug resistance (TDR) detection through the added value of UDS. DESIGN HIV pol gene fragments were sequenced by UDS and Sanger sequencing on samples of 70 HIV-1-infected, treatment-naive recently diagnosed MSM. METHODS Pairwise genetic distances and maximum likelihood phylogenies were computed. Transmission events were identified as clades with branch support at least 70% and intraclade genetic difference less than 4.5%. TDR mutations were recognized from the TDR consensus list. Transmission directionality, directness and inoculum size were inferred from tree topologies. RESULTS Both datasets concurred in the identification of seven transmission pairs and one cluster of three patients. With UDS, direction of transmission was inferred in four out of eight chains. Evidence for multiple founder viruses was found in two out of eight chains. No transmission of minority-resistant variants was evidenced. TDR mutations prevalence in protease and reverse transcriptase fragments was 4.3% with Sanger sequencing and 18.6% with UDS. CONCLUSION Although Sanger sequencing and UDS identified the same transmission chains, UDS provided additional information on founder viruses, direction of transmission and levels of TDR. Nevertheless, topology of clusters was not always consistent across gene fragments, calling for a cautious interpretation of the data. Moreover, unobserved intermediary links cannot be excluded. Phylogenetic analysis use as a forensic technique for HIV transmission investigations is risky.
Collapse
|
40
|
Campbell F, Cori A, Ferguson N, Jombart T. Bayesian inference of transmission chains using timing of symptoms, pathogen genomes and contact data. PLoS Comput Biol 2019; 15:e1006930. [PMID: 30925168 PMCID: PMC6457559 DOI: 10.1371/journal.pcbi.1006930] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Revised: 04/10/2019] [Accepted: 03/04/2019] [Indexed: 12/13/2022] Open
Abstract
There exists significant interest in developing statistical and computational tools for inferring 'who infected whom' in an infectious disease outbreak from densely sampled case data, with most recent studies focusing on the analysis of whole genome sequence data. However, genomic data can be poorly informative of transmission events if mutations accumulate too slowly to resolve individual transmission pairs or if there exist multiple pathogens lineages within-host, and there has been little focus on incorporating other types of outbreak data. We present here a methodology that uses contact data for the inference of transmission trees in a statistically rigorous manner, alongside genomic data and temporal data. Contact data is frequently collected in outbreaks of pathogens spread by close contact, including Ebola virus (EBOV), severe acute respiratory syndrome coronavirus (SARS-CoV) and Mycobacterium tuberculosis (TB), and routinely used to reconstruct transmission chains. As an improvement over previous, ad-hoc approaches, we developed a probabilistic model that relates a set of contact data to an underlying transmission tree and integrated this in the outbreaker2 inference framework. By analyzing simulated outbreaks under various contact tracing scenarios, we demonstrate that contact data significantly improves our ability to reconstruct transmission trees, even under realistic limitations on the coverage of the contact tracing effort and the amount of non-infectious mixing between cases. Indeed, contact data is equally or more informative than fully sampled whole genome sequence data in certain scenarios. We then use our method to analyze the early stages of the 2003 SARS outbreak in Singapore and describe the range of transmission scenarios consistent with contact data and genetic sequence in a probabilistic manner for the first time. This simple yet flexible model can easily be incorporated into existing tools for outbreak reconstruction and should permit a better integration of genomic and epidemiological data for inferring transmission chains.
Collapse
Affiliation(s)
- Finlay Campbell
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| | - Anne Cori
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| | - Neil Ferguson
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
| | - Thibaut Jombart
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
- Department of Infectious Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, United Kingdom
- UK Public Health Rapid Support Team, London, United Kingdom
| |
Collapse
|
41
|
Miller JK, Chen J, Sundermann A, Marsh JW, Saul MI, Shutt KA, Pacey M, Mustapha MM, Harrison LH, Dubrawski A. Statistical outbreak detection by joining medical records and pathogen similarity. J Biomed Inform 2019; 91:103126. [PMID: 30771483 PMCID: PMC6424617 DOI: 10.1016/j.jbi.2019.103126] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 01/05/2019] [Accepted: 02/06/2019] [Indexed: 01/08/2023]
Abstract
We present a statistical inference model for the detection and characterization of outbreaks of hospital associated infection. The approach combines patient exposures, determined from electronic medical records, and pathogen similarity, determined by whole-genome sequencing, to simultaneously identify probable outbreaks and their root-causes. We show how our model can be used to target isolates for whole-genome sequencing, improving outbreak detection and characterization even without comprehensive sequencing. Additionally, we demonstrate how to learn model parameters from reference data of known outbreaks. We demonstrate model performance using semi-synthetic experiments.
Collapse
Affiliation(s)
- James K Miller
- Auton Lab, Carnegie Mellon University, Pittsburgh, PA, United States.
| | - Jieshi Chen
- Auton Lab, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Alexander Sundermann
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States; Department of Infection Control and Hospital Epidemiology, University of Pittsburgh Medical Center, Pittsburgh, PA, United States
| | - Jane W Marsh
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States
| | - Melissa I Saul
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, United States
| | - Kathleen A Shutt
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States
| | - Marissa Pacey
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States
| | - Mustapha M Mustapha
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States
| | - Lee H Harrison
- Infectious Diseases Epidemiology Research Unit, University of Pittsburgh School of Medicine and Graduate School of Public Health, Pittsburgh, PA, United States
| | - Artur Dubrawski
- Auton Lab, Carnegie Mellon University, Pittsburgh, PA, United States
| |
Collapse
|
42
|
Campbell F, Didelot X, Fitzjohn R, Ferguson N, Cori A, Jombart T. outbreaker2: a modular platform for outbreak reconstruction. BMC Bioinformatics 2018; 19:363. [PMID: 30343663 PMCID: PMC6196407 DOI: 10.1186/s12859-018-2330-z] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reconstructing individual transmission events in an infectious disease outbreak can provide valuable information and help inform infection control policy. Recent years have seen considerable progress in the development of methodologies for reconstructing transmission chains using both epidemiological and genetic data. However, only a few of these methods have been implemented in software packages, and with little consideration for customisability and interoperability. Users are therefore limited to a small number of alternatives, incompatible tools with fixed functionality, or forced to develop their own algorithms at considerable personal effort. RESULTS Here we present outbreaker2, a flexible framework for outbreak reconstruction. This R package re-implements and extends the original model introduced with outbreaker, but most importantly also provides a modular platform allowing users to specify custom models within an optimised inferential framework. As a proof of concept, we implement the within-host evolutionary model introduced with TransPhylo, which is very distinct from the original genetic model in outbreaker, and demonstrate how even complex model results can be successfully included with minimal effort. CONCLUSIONS outbreaker2 provides a valuable starting point for future outbreak reconstruction tools, and represents a unifying platform that promotes customisability and interoperability. Implemented in the R software, outbreaker2 joins a growing body of tools for outbreak analysis.
Collapse
Affiliation(s)
- Finlay Campbell
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK
| | - Xavier Didelot
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK
| | - Rich Fitzjohn
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK
| | - Neil Ferguson
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK
| | - Anne Cori
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK
| | - Thibaut Jombart
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK
| |
Collapse
|
43
|
De Maio N, Worby CJ, Wilson DJ, Stoesser N. Bayesian reconstruction of transmission within outbreaks using genomic variants. PLoS Comput Biol 2018; 14:e1006117. [PMID: 29668677 PMCID: PMC5927459 DOI: 10.1371/journal.pcbi.1006117] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 04/30/2018] [Accepted: 04/03/2018] [Indexed: 01/19/2023] Open
Abstract
Pathogen genome sequencing can reveal details of transmission histories and is a powerful tool in the fight against infectious disease. In particular, within-host pathogen genomic variants identified through heterozygous nucleotide base calls are a potential source of information to identify linked cases and infer direction and time of transmission. However, using such data effectively to model disease transmission presents a number of challenges, including differentiating genuine variants from those observed due to sequencing error, as well as the specification of a realistic model for within-host pathogen population dynamics. Here we propose a new Bayesian approach to transmission inference, BadTrIP (BAyesian epiDemiological TRansmission Inference from Polymorphisms), that explicitly models evolution of pathogen populations in an outbreak, transmission (including transmission bottlenecks), and sequencing error. BadTrIP enables the inference of host-to-host transmission from pathogen sequencing data and epidemiological data. By assuming that genomic variants are unlinked, our method does not require the computationally intensive and unreliable reconstruction of individual haplotypes. Using simulations we show that BadTrIP is robust in most scenarios and can accurately infer transmission events by efficiently combining information from genetic and epidemiological sources; thanks to its realistic model of pathogen evolution and the inclusion of epidemiological data, BadTrIP is also more accurate than existing approaches. BadTrIP is distributed as an open source package (https://bitbucket.org/nicofmay/badtrip) for the phylogenetic software BEAST2. We apply our method to reconstruct transmission history at the early stages of the 2014 Ebola outbreak, showcasing the power of within-host genomic variants to reconstruct transmission events. We present a new tool to reconstruct transmission events within outbreaks. Our approach makes use of pathogen genetic information, notably genetic variants at low frequency within host that are usually discarded, and combines it with epidemiological information of host exposure to infection. This leads to accurate reconstruction of transmission even in cases where abundant within-host pathogen genetic variation and weak transmission bottlenecks (multiple pathogen units colonising a new host at transmission) would otherwise make inference difficult due to the transmission history differing from the pathogen evolution history inferred from pathogen isolets. Also, the use of within-host pathogen genomic variants increases the resolution of the reconstruction of the transmission tree even in scenarios with limited within-outbreak pathogen genetic diversity: within-host pathogen populations that appear identical at the level of consensus sequences can be discriminated using within-host variants. Our Bayesian approach provides a measure of the confidence in different possible transmission histories, and is published as open source software. We show with simulations and with an analysis of the beginning of the 2014 Ebola outbreak that our approach is applicable in many scenarios, improves our understanding of transmission dynamics, and will contribute to finding and limiting sources and routes of transmission, and therefore preventing the spread of infectious disease.
Collapse
Affiliation(s)
- Nicola De Maio
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Colin J Worby
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America
| | - Daniel J Wilson
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom.,Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Nicole Stoesser
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
44
|
Abstract
Transmissibility is the defining characteristic of infectious diseases. Quantifying transmission matters for understanding infectious disease epidemiology and designing evidence-based disease control programs. Tracing individual transmission events can be achieved by epidemiological investigation coupled with pathogen typing or genome sequencing. Individual infectiousness can be estimated by measuring pathogen loads, but few studies have directly estimated the ability of infected hosts to transmit to uninfected hosts. Individuals' opportunities to transmit infection are dependent on behavioral and other risk factors relevant given the transmission route of the pathogen concerned. Transmission at the population level can be quantified through knowledge of risk factors in the population or phylogeographic analysis of pathogen sequence data. Mathematical model-based approaches require estimation of the per capita transmission rate and basic reproduction number, obtained by fitting models to case data and/or analysis of pathogen sequence data. Heterogeneities in infectiousness, contact behavior, and susceptibility can have substantial effects on the epidemiology of an infectious disease, so estimates of only mean values may be insufficient. For some pathogens, super-shedders (infected individuals who are highly infectious) and super-spreaders (individuals with more opportunities to transmit infection) may be important. Future work on quantifying transmission should involve integrated analyses of multiple data sources.
Collapse
|
45
|
Stadler T, Gavryushkina A, Warnock RCM, Drummond AJ, Heath TA. The fossilized birth-death model for the analysis of stratigraphic range data under different speciation modes. J Theor Biol 2018; 447:41-55. [PMID: 29550451 PMCID: PMC5931795 DOI: 10.1016/j.jtbi.2018.03.005] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Revised: 09/15/2017] [Accepted: 03/05/2018] [Indexed: 10/26/2022]
Abstract
A birth-death-sampling model gives rise to phylogenetic trees with samples from the past and the present. Interpreting "birth" as branching speciation, "death" as extinction, and "sampling" as fossil preservation and recovery, this model - also referred to as the fossilized birth-death (FBD) model - gives rise to phylogenetic trees on extant and fossil samples. The model has been mathematically analyzed and successfully applied to a range of datasets on different taxonomic levels, such as penguins, plants, and insects. However, the current mathematical treatment of this model does not allow for a group of temporally distinct fossil specimens to be assigned to the same species. In this paper, we provide a general mathematical FBD modeling framework that explicitly takes "stratigraphic ranges" into account, with a stratigraphic range being defined as the lineage interval associated with a single species, ranging through time from the first to the last fossil appearance of the species. To assign a sequence of fossil samples in the phylogenetic tree to the same species, i.e., to specify a stratigraphic range, we need to define the mode of speciation. We provide expressions to account for three common speciation modes: budding (or asymmetric) speciation, bifurcating (or symmetric) speciation, and anagenetic speciation. Our equations allow for flexible joint Bayesian analysis of paleontological and neontological data. Furthermore, our framework is directly applicable to epidemiology, where a stratigraphic range is the observed duration of infection of a single patient, "birth" via budding is transmission, "death" is recovery, and "sampling" is sequencing the pathogen of a patient. Thus, we present a model that allows for incorporation of multiple observations through time from a single patient.
Collapse
Affiliation(s)
- Tanja Stadler
- Department of Biosystems Science & Engineering, Eidgenössische Technische Hochschule Zürich, Basel 4058, Switzerland; Swiss Institute of Bioinformatics (SIB), Switzerland.
| | - Alexandra Gavryushkina
- Department of Biosystems Science & Engineering, Eidgenössische Technische Hochschule Zürich, Basel 4058, Switzerland; Swiss Institute of Bioinformatics (SIB), Switzerland
| | - Rachel C M Warnock
- Department of Biosystems Science & Engineering, Eidgenössische Technische Hochschule Zürich, Basel 4058, Switzerland; Swiss Institute of Bioinformatics (SIB), Switzerland
| | - Alexei J Drummond
- Department of Computer Science, Centre for Computational Evolution, University of Auckland, Auckland 1010, New Zealand
| | - Tracy A Heath
- Department of Ecology, Evolution, & Organismal Biology, Iowa State University, Ames, Iowa 50011, USA
| |
Collapse
|
46
|
Wymant C, Hall M, Ratmann O, Bonsall D, Golubchik T, de Cesare M, Gall A, Cornelissen M, Fraser C. PHYLOSCANNER: Inferring Transmission from Within- and Between-Host Pathogen Genetic Diversity. Mol Biol Evol 2018; 35:719-733. [PMID: 29186559 PMCID: PMC5850600 DOI: 10.1093/molbev/msx304] [Citation(s) in RCA: 96] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
A central feature of pathogen genomics is that different infectious particles (virions and bacterial cells) within an infected individual may be genetically distinct, with patterns of relatedness among infectious particles being the result of both within-host evolution and transmission from one host to the next. Here, we present a new software tool, phyloscanner, which analyses pathogen diversity from multiple infected hosts. phyloscanner provides unprecedented resolution into the transmission process, allowing inference of the direction of transmission from sequence data alone. Multiply infected individuals are also identified, as they harbor subpopulations of infectious particles that are not connected by within-host evolution, except where recombinant types emerge. Low-level contamination is flagged and removed. We illustrate phyloscanner on both viral and bacterial pathogens, namely HIV-1 sequenced on Illumina and Roche 454 platforms, HCV sequenced with the Oxford Nanopore MinION platform, and Streptococcus pneumoniae with sequences from multiple colonies per individual. phyloscanner is available from https://github.com/BDI-pathogens/phyloscanner.
Collapse
Affiliation(s)
- Chris Wymant
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, United Kingdom
- Department of Infectious Disease Epidemiology, Medical Research Council Centre for Outbreak Analysis and Modelling, Imperial College London, London, United Kingdom
| | - Matthew Hall
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, United Kingdom
- Department of Infectious Disease Epidemiology, Medical Research Council Centre for Outbreak Analysis and Modelling, Imperial College London, London, United Kingdom
| | - Oliver Ratmann
- Department of Infectious Disease Epidemiology, Medical Research Council Centre for Outbreak Analysis and Modelling, Imperial College London, London, United Kingdom
- Department of Mathematics, Imperial College London, London, United Kingdom
| | - David Bonsall
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, United Kingdom
- Peter Medawar Building for Pathogen Research, Nuffield Department of Medicine and the NIHR Oxford BRC, University of Oxford, United Kingdom
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, United Kingdom
| | - Tanya Golubchik
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, United Kingdom
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, United Kingdom
| | - Mariateresa de Cesare
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, United Kingdom
| | - Astrid Gall
- Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Marion Cornelissen
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands
| | - Christophe Fraser
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, United Kingdom
- Department of Infectious Disease Epidemiology, Medical Research Council Centre for Outbreak Analysis and Modelling, Imperial College London, London, United Kingdom
| | | |
Collapse
|
47
|
Campbell F, Strang C, Ferguson N, Cori A, Jombart T. When are pathogen genome sequences informative of transmission events? PLoS Pathog 2018; 14:e1006885. [PMID: 29420641 PMCID: PMC5821398 DOI: 10.1371/journal.ppat.1006885] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 02/21/2018] [Accepted: 01/18/2018] [Indexed: 01/19/2023] Open
Abstract
Recent years have seen the development of numerous methodologies for reconstructing transmission trees in infectious disease outbreaks from densely sampled whole genome sequence data. However, a fundamental and as of yet poorly addressed limitation of such approaches is the requirement for genetic diversity to arise on epidemiological timescales. Specifically, the position of infected individuals in a transmission tree can only be resolved by genetic data if mutations have accumulated between the sampled pathogen genomes. To quantify and compare the useful genetic diversity expected from genetic data in different pathogen outbreaks, we introduce here the concept of ‘transmission divergence’, defined as the number of mutations separating whole genome sequences sampled from transmission pairs. Using parameter values obtained by literature review, we simulate outbreak scenarios alongside sequence evolution using two models described in the literature to describe transmission divergence of ten major outbreak-causing pathogens. We find that while mean values vary significantly between the pathogens considered, their transmission divergence is generally very low, with many outbreaks characterised by large numbers of genetically identical transmission pairs. We describe the impact of transmission divergence on our ability to reconstruct outbreaks using two outbreak reconstruction tools, the R packages outbreaker and phybreak, and demonstrate that, in agreement with previous observations, genetic sequence data of rapidly evolving pathogens such as RNA viruses can provide valuable information on individual transmission events. Conversely, sequence data of pathogens with lower mean transmission divergence, including Streptococcus pneumoniae, Shigella sonnei and Clostridium difficile, provide little to no information about individual transmission events. Our results highlight the informational limitations of genetic sequence data in certain outbreak scenarios, and demonstrate the need to expand the toolkit of outbreak reconstruction tools to integrate other types of epidemiological data. The increasing availability of genetic sequence data has sparked an interest in using pathogen whole genome sequences to reconstruct the history of individual transmission events in an infectious disease outbreak. However, such methodologies rely on pathogen genomes mutating rapidly enough to discriminate between infected individuals, an assumption that remains to be investigated. To determine pathogen outbreaks for which genetic data is expected to be informative of transmission events, we introduce here the concept of ‘transmission divergence’, defined as the number of mutations separating pathogen genome sequences sampled from transmission pairs. We characterise transmission divergence of ten major outbreak causing pathogens using simulations and find significant variation between diseases, with viral outbreaks generally exhibiting higher transmission divergence than bacterial ones. We reconstruct these outbreaks using the R-packages outbreaker and phybreak and find that genetic sequence data, though useful for rapidly evolving pathogens, provides little to no information about outbreaks with low transmission divergence, such as Streptococcus pneumoniae and Shigella sonnei. Our results demonstrate the need to incorporate other sources of outbreak data, such as contact tracing data and spatial location data, into outbreak reconstruction tools.
Collapse
Affiliation(s)
- Finlay Campbell
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- * E-mail: (FC); (TJ); (AC)
| | - Camilla Strang
- Centre for Preventive Medicine, Department of Epidemiology and Disease Surveillance, Animal Health Trust, Suffolk, United Kingdom
| | - Neil Ferguson
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| | - Anne Cori
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- * E-mail: (FC); (TJ); (AC)
| | - Thibaut Jombart
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
- * E-mail: (FC); (TJ); (AC)
| |
Collapse
|
48
|
Kendall M, Ayabina D, Xu Y, Stimson J, Colijn C. Estimating Transmission from Genetic and Epidemiological Data: A Metric to Compare Transmission Trees. Stat Sci 2018. [DOI: 10.1214/17-sts637] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
49
|
Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol 2018; 4:vey016. [PMID: 29942656 PMCID: PMC6007674 DOI: 10.1093/ve/vey016] [Citation(s) in RCA: 1874] [Impact Index Per Article: 312.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package has become a primary tool for Bayesian phylogenetic and phylodynamic inference from genetic sequence data. BEAST unifies molecular phylogenetic reconstruction with complex discrete and continuous trait evolution, divergence-time dating, and coalescent demographic models in an efficient statistical inference engine using Markov chain Monte Carlo integration. A convenient, cross-platform, graphical user interface allows the flexible construction of complex evolutionary analyses.
Collapse
Affiliation(s)
- Marc A Suchard
- Department of Biomathematics, David Geffen School of MedicineUniversity of California, Los Angeles, 621 Charles E. Young Dr., South, Los Angeles, CA, 90095 USA
- Department of Biostatistics, Fielding School of Public HealthUniversity of California, Los Angeles, 650 Charles E, Young Dr., South, Los Angeles, CA, 90095 USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, 695 Charles E. Young Dr., South, Los Angeles, CA, 90095 USA
| | - Philippe Lemey
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Guy Baele
- Department of Microbiology and Immunology, Rega Institute, KU Leuven, Herestraat 49, 3000 Leuven, Belgium
| | - Daniel L Ayres
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, 125 Biomolecular Science Bldg #296, College Park, MD 20742 USA
| | - Alexei J Drummond
- Department of Computer Science, University of Auckland, 303/38 Princes St., Auckland, 1010 NZ
- Centre for Computational Evolution, University of Auckland, 303/38 Princes St., Auckland, 1010 NZ
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, EH9 3FL UK
| |
Collapse
|
50
|
Taylor AR, Schaffner SF, Cerqueira GC, Nkhoma SC, Anderson TJC, Sriprawat K, Pyae Phyo A, Nosten F, Neafsey DE, Buckee CO. Quantifying connectivity between local Plasmodium falciparum malaria parasite populations using identity by descent. PLoS Genet 2017; 13:e1007065. [PMID: 29077712 PMCID: PMC5678785 DOI: 10.1371/journal.pgen.1007065] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 11/08/2017] [Accepted: 10/10/2017] [Indexed: 01/18/2023] Open
Abstract
With the rapidly increasing abundance and accessibility of genomic data, there is a growing interest in using population genetic approaches to characterize fine-scale dispersal of organisms, providing insight into biological processes across a broad range of fields including ecology, evolution and epidemiology. For sexually recombining haploid organisms such as the human malaria parasite P. falciparum, however, there have been no systematic assessments of the type of data and methods required to resolve fine scale connectivity. This analytical gap hinders the use of genomics for understanding local transmission patterns, a crucial goal for policy makers charged with eliminating this important human pathogen. Here we use data collected from four clinics with a catchment area spanning approximately 120 km of the Thai-Myanmar border to compare the ability of divergence (FST) and relatedness based on identity by descent (IBD) to resolve spatial connectivity between malaria parasites collected from proximal clinics. We found no relationship between inter-clinic distance and FST, likely due to sampling of highly related parasites within clinics, but a significant decline in IBD-based relatedness with increasing inter-clinic distance. This association was contingent upon the data set type and size. We estimated that approximately 147 single-infection whole genome sequenced parasite samples or 222 single-infection parasite samples genotyped at 93 single nucleotide polymorphisms (SNPs) were sufficient to recover a robust spatial trend estimate at this scale. In summary, surveillance efforts cannot rely on classical measures of genetic divergence to measure P. falciparum transmission on a local scale. Given adequate sampling, IBD-based relatedness provides a useful alternative, and robust trends can be obtained from parasite samples genotyped at approximately 100 SNPs.
Collapse
Affiliation(s)
- Aimee R. Taylor
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, Massachusetts, United States of America
| | - Stephen F. Schaffner
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, Massachusetts, United States of America
| | - Gustavo C. Cerqueira
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, Massachusetts, United States of America
| | - Standwell C. Nkhoma
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, United States of America
| | - Timothy J. C. Anderson
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, United States of America
| | - Kanlaya Sriprawat
- Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Mae Sot, Thailand
| | - Aung Pyae Phyo
- Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Mae Sot, Thailand
| | - François Nosten
- Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Mae Sot, Thailand
- Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine Research building, University of Oxford, Old Road campus, Oxford, United Kingdom
| | - Daniel E. Neafsey
- Infectious Disease and Microbiome Program, Broad Institute, Cambridge, Massachusetts, United States of America
- Department of Immunology and Infectious Disease, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Caroline O. Buckee
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|