1
|
Kupperman MD, Ke R, Leitner T. Identifying Impacts of Contact Tracing on Epidemiological Inference from Phylogenetic Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.30.567148. [PMID: 38076930 PMCID: PMC10705478 DOI: 10.1101/2023.11.30.567148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Robust sampling methods are foundational to inferences using phylogenies. Yet the impact of using contact tracing, a type of non-uniform sampling used in public health applications such as infectious disease outbreak investigations, has not been investigated in the molecular epidemiology field. To understand how contact tracing influences a recovered phylogeny, we developed a new simulation tool called SEEPS (Sequence Evolution and Epidemiological Process Simulator) that allows for the simulation of contact tracing and the resulting transmission tree, pathogen phylogeny, and corresponding virus genetic sequences. Importantly, SEEPS takes within-host evolution into account when generating pathogen phylogenies and sequences from transmission histories. Using SEEPS, we demonstrate that contact tracing can significantly impact the structure of the resulting tree, as described by popular tree statistics. Contact tracing generates phylogenies that are less balanced than the underlying transmission process, less representative of the larger epidemiological process, and affects the internal/external branch length ratios that characterize specific epidemiological scenarios. We also examined real data from a 2007-2008 Swedish HIV-1 outbreak and the broader 1998-2010 European HIV-1 epidemic to highlight the differences in contact tracing and expected phylogenies. Aided by SEEPS, we show that the data collection of the Swedish outbreak was strongly influenced by contact tracing even after downsampling, while the broader European Union epidemic showed little evidence of universal contact tracing, agreeing with the known epidemiological information about sampling and spread. Overall, our results highlight the importance of including possible non-uniform sampling schemes when examining phylogenetic trees. For that, SEEPS serves as a useful tool to evaluate such impacts, thereby facilitating better phylogenetic inferences of the characteristics of a disease outbreak. SEEPS is available at github.com/MolEvolEpid/SEEPS.
Collapse
Affiliation(s)
- Michael D. Kupperman
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, New Mexico, United States of America
- Department of Applied Mathematics, University of Washington, Washington, United States of America
| | - Ruian Ke
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, New Mexico, United States of America
| | - Thomas Leitner
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, New Mexico, United States of America
| |
Collapse
|
2
|
Wagle S, Markin A, Górecki P, Anderson TK, Eulenstein O. Asymmetric Cluster-Based Measures for Comparative Phylogenetics. J Comput Biol 2024; 31:312-327. [PMID: 38634854 PMCID: PMC11057527 DOI: 10.1089/cmb.2023.0338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024] Open
Abstract
Phylogenetic inference and reconstruction methods generate hypotheses on evolutionary history. Competing inference methods are frequently used, and the evaluation of the generated hypotheses is achieved using tree comparison costs. The Robinson-Foulds (RF) distance is a widely used cost to compare the topology of two trees, but this cost is sensitive to tree error and can overestimate tree differences. To overcome this limitation, a refined version of the RF distance called the Cluster Affinity (CA) distance was introduced. However, CA distances are symmetric and cannot compare different types of trees. These asymmetric comparisons occur when gene trees are compared with species trees, when disparate datasets are integrated into a supertree, or when tree comparison measures are used to infer a phylogenetic network. In this study, we introduce a relaxation of the original Affinity distance to compare heterogeneous trees called the asymmetric CA cost. We also develop a biologically interpretable cost, the Cluster Support cost that normalizes by cluster size across gene trees. The characteristics of these costs are similar to the symmetric CA cost. We describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These costs provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.
Collapse
Affiliation(s)
- Sanket Wagle
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| | - Alexey Markin
- National Animal Disease Center, USDA-ARS, Ames, Iowa, USA
| | - Paweł Górecki
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | | | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, Iowa, USA
| |
Collapse
|
3
|
Castro LA, Leitner T, Romero-Severson E. Recombination smooths the time signal disrupted by latency in within-host HIV phylogenies. Virus Evol 2023; 9:vead032. [PMID: 37397911 PMCID: PMC10313349 DOI: 10.1093/ve/vead032] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 04/07/2023] [Accepted: 05/15/2023] [Indexed: 07/04/2023] Open
Abstract
Within-host Human immunodeficiency virus (HIV) evolution involves several features that may disrupt standard phylogenetic reconstruction. One important feature is reactivation of latently integrated provirus, which has the potential to disrupt the temporal signal, leading to variation in the branch lengths and apparent evolutionary rates in a tree. Yet, real within-host HIV phylogenies tend to show clear, ladder-like trees structured by the time of sampling. Another important feature is recombination, which violates the fundamental assumption that evolutionary history can be represented by a single bifurcating tree. Thus, recombination complicates the within-host HIV dynamic by mixing genomes and creating evolutionary loop structures that cannot be represented in a bifurcating tree. In this paper, we develop a coalescent-based simulator of within-host HIV evolution that includes latency, recombination, and effective population size dynamics that allows us to study the relationship between the true, complex genealogy of within-host HIV evolution, encoded as an ancestral recombination graph (ARG), and the observed phylogenetic tree. To compare our ARG results to the familiar phylogeny format, we calculate the expected bifurcating tree after decomposing the ARG into all unique site trees, their combined distance matrix, and the overall corresponding bifurcating tree. While latency and recombination separately disrupt the phylogenetic signal, remarkably, we find that recombination recovers the temporal signal of within-host HIV evolution caused by latency by mixing fragments of old, latent genomes into the contemporary population. In effect, recombination averages over extant heterogeneity, whether it stems from mixed time signals or population bottlenecks. Furthermore, we establish that the signals of latency and recombination can be observed in phylogenetic trees despite being an incorrect representation of the true evolutionary history. Using an approximate Bayesian computation method, we develop a set of statistical probes to tune our simulation model to nine longitudinally sampled within-host HIV phylogenies. Because ARGs are exceedingly difficult to infer from real HIV data, our simulation system allows investigating effects of latency, recombination, and population size bottlenecks by matching decomposed ARGs to real data as observed in standard phylogenies.
Collapse
Affiliation(s)
| | - Thomas Leitner
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | |
Collapse
|
4
|
A deep learning approach to real-time HIV outbreak detection using genetic data. PLoS Comput Biol 2022; 18:e1010598. [PMID: 36240224 PMCID: PMC9604978 DOI: 10.1371/journal.pcbi.1010598] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 10/26/2022] [Accepted: 09/23/2022] [Indexed: 12/15/2022] Open
Abstract
Pathogen genomic sequence data are increasingly made available for epidemiological monitoring. A main interest is to identify and assess the potential of infectious disease outbreaks. While popular methods to analyze sequence data often involve phylogenetic tree inference, they are vulnerable to errors from recombination and impose a high computational cost, making it difficult to obtain real-time results when the number of sequences is in or above the thousands. Here, we propose an alternative strategy to outbreak detection using genomic data based on deep learning methods developed for image classification. The key idea is to use a pairwise genetic distance matrix calculated from viral sequences as an image, and develop convolutional neutral network (CNN) models to classify areas of the images that show signatures of active outbreak, leading to identification of subsets of sequences taken from an active outbreak. We showed that our method is efficient in finding HIV-1 outbreaks with R0 ≥ 2.5, and overall a specificity exceeding 98% and sensitivity better than 92%. We validated our approach using data from HIV-1 CRF01 in Europe, containing both endemic sequences and a well-known dual outbreak in intravenous drug users. Our model accurately identified known outbreak sequences in the background of slower spreading HIV. Importantly, we detected both outbreaks early on, before they were over, implying that had this method been applied in real-time as data became available, one would have been able to intervene and possibly prevent the extent of these outbreaks. This approach is scalable to processing hundreds of thousands of sequences, making it useful for current and future real-time epidemiological investigations, including public health monitoring using large databases and especially for rapid outbreak identification.
Collapse
|
5
|
Lundgren E, Romero-Severson E, Albert J, Leitner T. Combining biomarker and virus phylogenetic models improves HIV-1 epidemiological source identification. PLoS Comput Biol 2022; 18:e1009741. [PMID: 36026480 PMCID: PMC9455879 DOI: 10.1371/journal.pcbi.1009741] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 09/08/2022] [Accepted: 08/02/2022] [Indexed: 01/07/2023] Open
Abstract
To identify and stop active HIV transmission chains new epidemiological techniques are needed. Here, we describe the development of a multi-biomarker augmentation to phylogenetic inference of the underlying transmission history in a local population. HIV biomarkers are measurable biological quantities that have some relationship to the amount of time someone has been infected with HIV. To train our model, we used five biomarkers based on real data from serological assays, HIV sequence data, and target cell counts in longitudinally followed, untreated patients with known infection times. The biomarkers were modeled with a mixed effects framework to allow for patient specific variation and general trends, and fit to patient data using Markov Chain Monte Carlo (MCMC) methods. Subsequently, the density of the unobserved infection time conditional on observed biomarkers were obtained by integrating out the random effects from the model fit. This probabilistic information about infection times was incorporated into the likelihood function for the transmission history and phylogenetic tree reconstruction, informed by the HIV sequence data. To critically test our methodology, we developed a coalescent-based simulation framework that generates phylogenies and biomarkers given a specific or general transmission history. Testing on many epidemiological scenarios showed that biomarker augmented phylogenetics can reach 90% accuracy under idealized situations. Under realistic within-host HIV-1 evolution, involving substantial within-host diversification and frequent transmission of multiple lineages, the average accuracy was at about 50% in transmission clusters involving 5-50 hosts. Realistic biomarker data added on average 16 percentage points over using the phylogeny alone. Using more biomarkers improved the performance. Shorter temporal spacing between transmission events and increased transmission heterogeneity reduced reconstruction accuracy, but larger clusters were not harder to get right. More sequence data per infected host also improved accuracy. We show that the method is robust to incomplete sampling and that adding biomarkers improves reconstructions of real HIV-1 transmission histories. The technology presented here could allow for better prevention programs by providing data for locally informed and tailored strategies.
Collapse
Affiliation(s)
- Erik Lundgren
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Ethan Romero-Severson
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Jan Albert
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Microbiology, Karolinska University Hospital, Stockholm, Sweden
| | - Thomas Leitner
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
- * E-mail:
| |
Collapse
|
6
|
Guang A, Howison M, Ledingham L, D’Antuono M, Chan PA, Lawrence C, Dunn CW, Kantor R. Incorporating Within-Host Diversity in Phylogenetic Analyses for Detecting Clusters of New HIV Diagnoses. Front Microbiol 2022; 12:803190. [PMID: 35250908 PMCID: PMC8891961 DOI: 10.3389/fmicb.2021.803190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 12/22/2021] [Indexed: 11/29/2022] Open
Abstract
Background Phylogenetic analyses of HIV sequences are used to detect clusters and inform public health interventions. Conventional approaches summarize within-host HIV diversity with a single consensus sequence per host of the pol gene, obtained from Sanger or next-generation sequencing (NGS). There is growing recognition that this approach discards potentially important information about within-host sequence variation, which can impact phylogenetic inference. However, whether alternative summary methods that incorporate intra-host variation impact phylogenetic inference of transmission network features is unknown. Methods We introduce profile sampling, a method to incorporate within-host NGS sequence diversity into phylogenetic HIV cluster inference. We compare this approach to Sanger- and NGS-derived pol and near-whole-genome consensus sequences and evaluate its potential benefits in identifying molecular clusters among all newly-HIV-diagnosed individuals over six months at the largest HIV center in Rhode Island. Results Profile sampling cluster inference demonstrated that within-host viral diversity impacts phylogenetic inference across individuals, and that consensus sequence approaches can obscure both magnitude and effect of these impacts. Clustering differed between Sanger- and NGS-derived consensus and profile sampling sequences, and across gene regions. Discussion Profile sampling can incorporate within-host HIV diversity captured by NGS into phylogenetic analyses. This additional information can improve robustness of cluster detection.
Collapse
Affiliation(s)
- August Guang
- Center for Computational Biology of Human Disease, Brown University, Providence, RI, United States
- Center for Computation and Visualization, Brown University, Providence, RI, United States
- *Correspondence: August Guang,
| | - Mark Howison
- Research Improving People’s Lives, Providence, RI, United States
| | - Lauren Ledingham
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| | - Matthew D’Antuono
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| | - Philip A. Chan
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| | - Charles Lawrence
- Division of Applied Mathematics, Brown University, Providence, RI, United States
| | - Casey W. Dunn
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, United States
| | - Rami Kantor
- Division of Infectious Diseases, The Alpert Medical School, Brown University, Providence, RI, United States
| |
Collapse
|
7
|
Chindelevitch L, Hayati M, Poon AFY, Colijn C. Network science inspires novel tree shape statistics. PLoS One 2021; 16:e0259877. [PMID: 34941890 PMCID: PMC8699983 DOI: 10.1371/journal.pone.0259877] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 10/28/2021] [Indexed: 11/18/2022] Open
Abstract
The shape of phylogenetic trees can be used to gain evolutionary insights. A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at http://github.com/Leonardini/treeCentrality.
Collapse
Affiliation(s)
- Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, Imperial College London, London, United Kingdom
- * E-mail:
| | - Maryam Hayati
- School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Art F. Y. Poon
- Department of Pathology & Laboratory Medicine, University of Western Ontario, London, ON, Canada
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
8
|
Phylogenetic Networks and Parameters Inferred from HIV Nucleotide Sequences of High-Risk and General Population Groups in Uganda: Implications for Epidemic Control. Viruses 2021; 13:v13060970. [PMID: 34073846 PMCID: PMC8225143 DOI: 10.3390/v13060970] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 05/13/2021] [Accepted: 05/18/2021] [Indexed: 12/17/2022] Open
Abstract
Phylogenetic inference is useful in characterising HIV transmission networks and assessing where prevention is likely to have the greatest impact. However, estimating parameters that influence the network structure is still scarce, but important in evaluating determinants of HIV spread. We analyzed 2017 HIV pol sequences (728 Lake Victoria fisherfolk communities (FFCs), 592 female sex workers (FSWs) and 697 general population (GP)) to identify transmission networks on Maximum Likelihood (ML) phylogenetic trees and refined them using time-resolved phylogenies. Network generative models were fitted to the observed degree distributions and network parameters, and corrected Akaike Information Criteria and Bayesian Information Criteria values were estimated. 347 (17.2%) HIV sequences were linked on ML trees (maximum genetic distance ≤4.5%, ≥95% bootstrap support) and, of these, 303 (86.7%) that consisted of pure A1 (n = 168) and D (n = 135) subtypes were analyzed in BEAST v1.8.4. The majority of networks (at least 40%) were found at a time depth of ≤5 years. The waring and yule models fitted best networks of FFCs and FSWs respectively while the negative binomial model fitted best networks in the GP. The network structure in the HIV-hyperendemic FFCs is likely to be scale-free and shaped by preferential attachment, in contrast to the GP. The findings support the targeting of interventions for FFCs in a timely manner for effective epidemic control. Interventions ought to be tailored according to the dynamics of the HIV epidemic in the target population and understanding the network structure is critical in ensuring the success of HIV prevention programs.
Collapse
|
9
|
Angevaare J, Feng Z, Deardon R. Inference of latent event times and transmission networks in individual level infectious disease models. Spat Spatiotemporal Epidemiol 2021; 37:100410. [PMID: 33980405 DOI: 10.1016/j.sste.2021.100410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 01/20/2021] [Accepted: 01/28/2021] [Indexed: 10/22/2022]
Abstract
Transmission networks indicate who-infected-whom in epidemics. Reconstruction of transmission networks is invaluable in applying and developing effective control strategies for infectious diseases. We introduce transmission network individual level models (TN-ILMs), a competing-risk, continuous time extension to individual level model framework for infectious diseases of Deardon et al. (2010). Through simulation study using a Julia language software package, Pathogen.jl, we explore the models with respect to their ability to jointly infer latent event times, latent disease transmission networks, and the TN-ILM parameters. We find good parameter, event time, and transmission network inference, with enhanced performance for inference of transmission networks in epidemic simulations that have higher spatial signals in their infectivity kernel. Finally, an application of a TN-ILM to data from a greenhouse experiment on the spread of tomato spotted wilt virus is presented.
Collapse
Affiliation(s)
| | - Zeny Feng
- University of Guelph, Canada. https://zfeng.uoguelph.ca
| | - Rob Deardon
- University of Calgary, Canada. https://people.ucalgary.ca/~robert.deardon/
| |
Collapse
|
10
|
Zhang Y, Leitner T, Albert J, Britton T. Inferring transmission heterogeneity using virus genealogies: Estimation and targeted prevention. PLoS Comput Biol 2020; 16:e1008122. [PMID: 32881984 PMCID: PMC7494101 DOI: 10.1371/journal.pcbi.1008122] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Revised: 09/16/2020] [Accepted: 07/02/2020] [Indexed: 12/19/2022] Open
Abstract
Spread of HIV typically involves uneven transmission patterns where some individuals spread to a large number of individuals while others to only a few or none. Such transmission heterogeneity can impact how fast and how much an epidemic spreads. Further, more efficient interventions may be achieved by taking such transmission heterogeneity into account. To address these issues, we developed two phylogenetic methods based on virus sequence data: 1) to generally detect if significant transmission heterogeneity is present, and 2) to pinpoint where in a phylogeny high-level spread is occurring. We derive inference procedures to estimate model parameters, including the amount of transmission heterogeneity, in a sampled epidemic. We show that it is possible to detect transmission heterogeneity under a wide range of simulated situations, including incomplete sampling, varying levels of heterogeneity, and including within-host genetic diversity. When evaluating real HIV-1 data from different epidemic scenarios, we found a lower level of transmission heterogeneity in slowly spreading situations and a higher level of heterogeneity in data that included a rapid outbreak, while R0 and Sackin's index (overall tree shape statistic) were similar in the two scenarios, suggesting that our new method is able to detect transmission heterogeneity in real data. We then show by simulations that targeted prevention, where we pinpoint high-level spread using a coalescence measurement, is efficient when sequence data are collected in an ongoing surveillance system. Such phylogeny-guided prevention is efficient under both single-step contact tracing as well as iterative contact tracing as compared to random intervention.
Collapse
Affiliation(s)
- Yunjun Zhang
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
- Department of Mathematics, Stockholm University, Stockholm, Sweden
- * E-mail:
| | - Thomas Leitner
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America
| | - Jan Albert
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, Stockholm, Sweden
- Department of Clinical Microbiology, Karolinska University Hospital, Stockholm, Sweden
| | - Tom Britton
- Department of Mathematics, Stockholm University, Stockholm, Sweden
| |
Collapse
|
11
|
Abstract
PURPOSE OF REVIEW Within-host diversity complicates transmission models because it recognizes that between-host virus phylogenies are not identical to the transmission history among the infected hosts. This review presents the biological and theoretical foundations for recent development in this field, and shows that modern phylodynamic methods are capable of inferring realistic transmission histories from HIV sequence data. RECENT FINDINGS Transmission of single or multiple genetic variants from a donor's HIV population results in donor-recipient phylogenies with combinations of monophyletic, paraphyletic, and polyphyletic patterns. Large-scale simulations and analyses of many real HIV datasets have established that transmission direction, directness, or common source often can be inferred based on HIV sequence data. Phylodynamic reconstruction of HIV transmissions that include within-host HIV diversity have recently been established and made available in several software packages. SUMMARY Phylodynamic methods that include realistic features of HIV genetic diversification have come of age, significantly improving inference of key epidemiological parameters. This opens the door to more accurate surveillance and better-informed prevention campaigns.
Collapse
|
12
|
|
13
|
Barido-Sottani J, Vaughan TG, Stadler T. Detection of HIV transmission clusters from phylogenetic trees using a multi-state birth-death model. J R Soc Interface 2019; 15:rsif.2018.0512. [PMID: 30185544 PMCID: PMC6170769 DOI: 10.1098/rsif.2018.0512] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Accepted: 08/13/2018] [Indexed: 12/03/2022] Open
Abstract
HIV patients form clusters in HIV transmission networks. Accurate identification of these transmission clusters is essential to effectively target public health interventions. One reason for clustering is that the underlying contact network contains many local communities. We present a new maximum-likelihood method for identifying transmission clusters caused by community structure, based on phylogenetic trees. The method employs a multi-state birth–death (MSBD) model which detects changes in transmission rate, which are interpreted as the introduction of the epidemic into a new susceptible community, i.e. the formation of a new cluster. We show that the MSBD method is able to reliably infer the clusters and the transmission parameters from a pathogen phylogeny based on our simulations. In contrast to existing cutpoint-based methods for cluster identification, our method does not require that clusters be monophyletic nor is it dependent on the selection of a difficult-to-interpret cutpoint parameter. We present an application of our method to data from the Swiss HIV Cohort Study. The method is available as an easy-to-use R package.
Collapse
Affiliation(s)
- Joëlle Barido-Sottani
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland .,Swiss Institute of Bioinformatics (SIB), Switzerland
| | - Timothy G Vaughan
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland.,Swiss Institute of Bioinformatics (SIB), Switzerland
| |
Collapse
|
14
|
Abstract
One approach to the reconstruction of infectious disease transmission trees from pathogen genomic data has been to use a phylogenetic tree, reconstructed from pathogen sequences, and annotate its internal nodes to provide a reconstruction of which host each lineage was in at each point in time. If only one pathogen lineage can be transmitted to a new host (i.e., the transmission bottleneck is complete), this corresponds to partitioning the nodes of the phylogeny into connected regions, each of which represents evolution in an individual host. These partitions define the possible transmission trees that are consistent with a given phylogenetic tree. However, the mathematical properties of the transmission trees given a phylogeny remain largely unexplored. Here, we describe a procedure to calculate the number of possible transmission trees for a given phylogeny, and we then show how to uniformly sample from these transmission trees. The procedure is outlined for situations where one sample is available from each host and trees do not have branch lengths, and we also provide extensions for incomplete sampling, multiple sampling, and the application to time trees in a situation where limits on the period during which each host could have been infected and infectious are known. The sampling algorithm is available as an R package (STraTUS).
Collapse
Affiliation(s)
- Matthew D Hall
- Nuffield Department of Medicine, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
15
|
Childs LM, El Moustaid F, Gajewski Z, Kadelka S, Nikin-Beers R, Smith JW, Walker M, Johnson LR. Linked within-host and between-host models and data for infectious diseases: a systematic review. PeerJ 2019; 7:e7057. [PMID: 31249734 PMCID: PMC6589080 DOI: 10.7717/peerj.7057] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 04/28/2019] [Indexed: 12/17/2022] Open
Abstract
The observed dynamics of infectious diseases are driven by processes across multiple scales. Here we focus on two: within-host, that is, how an infection progresses inside a single individual (for instance viral and immune dynamics), and between-host, that is, how the infection is transmitted between multiple individuals of a host population. The dynamics of each of these may be influenced by the other, particularly across evolutionary time. Thus understanding each of these scales, and the links between them, is necessary for a holistic understanding of the spread of infectious diseases. One approach to combining these scales is through mathematical modeling. We conducted a systematic review of the published literature on multi-scale mathematical models of disease transmission (as defined by combining within-host and between-host scales) to determine the extent to which mathematical models are being used to understand across-scale transmission, and the extent to which these models are being confronted with data. Following the PRISMA guidelines for systematic reviews, we identified 24 of 197 qualifying papers across 30 years that include both linked models at the within and between host scales and that used data to parameterize/calibrate models. We find that the approach that incorporates both modeling with data is under-utilized, if increasing. This highlights the need for better communication and collaboration between modelers and empiricists to build well-calibrated models that both improve understanding and may be used for prediction.
Collapse
Affiliation(s)
- Lauren M Childs
- Department of Mathematics, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA
| | - Fadoua El Moustaid
- Department of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA.,Global Change Center, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA
| | - Zachary Gajewski
- Department of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA.,Global Change Center, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA.,Department of Statistics, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA
| | - Sarah Kadelka
- Department of Mathematics, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA
| | - Ryan Nikin-Beers
- Department of Mathematics, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA.,Department of Mathematics, University of Florida, Gainesville, FL, USA
| | - John W Smith
- Department of Statistics, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA
| | - Melody Walker
- Department of Mathematics, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA
| | - Leah R Johnson
- Department of Biological Sciences, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA.,Global Change Center, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA.,Department of Statistics, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA.,Computational Modeling and Data Analytics, Virginia Polytechnic Institute and State University (Virginia Tech), Blacksburg, VA, USA
| |
Collapse
|
16
|
Abstract
HIV is one of the fastest evolving organisms known. It evolves about 1 million times faster than its host, humans. Because HIV establishes chronic infections, with continuous evolution, its divergence within a single infected human surpasses the divergence of the entire humanoid history. Yet, it is still the same virus, infecting the same cell types and using the same replication machinery year after year. Hence, one would think that most mutations that HIV accumulates are neutral. But the picture is more complicated than that. HIV evolution is also a clear example of strong positive selection, that is, mutants have a survival advantage. How do these facts come together?
Collapse
Affiliation(s)
- Thomas Leitner
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM
| |
Collapse
|
17
|
Insights on transmission of HIV from phylogenetic analysis to locally optimize HIV prevention strategies. Curr Opin HIV AIDS 2019; 13:95-101. [PMID: 29266012 DOI: 10.1097/coh.0000000000000443] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
PURPOSE OF REVIEW Phylogenetic analysis can identify transmission networks by clustering genetically related HIV genotypes that are routinely collected. In this study, we will review phylogenetic insights gained on transmission of HIV and phylogenetically optimized HIV prevention strategies. RECENT FINDINGS Phylogenetic analysis reports that HIV transmission varies by geographical region and by route of transmission. In high-income countries, HIV is predominantly transmitted between recently infected MSM who live in the same country. In rural Uganda, transmission of HIV is frequently between different communities. Age-discrepant transmission has been reported across the world. Four studies have used phylogenetic optimization of HIV prevention. Three studies predict that immediate treatment after diagnosis would have prevented 19-42% of infections, and that preexposure prophylaxis would have prevented 66% of infections. One phylogenetic study guided a public health response to an actively ongoing HIV outbreak. Phylogenetic clustering requires a dense sample of patients and small time-gaps between infection and diagnosis. SUMMARY Phylogenetic analysis can be an important tool to identify a local strategy that prevents most infections. Future studies that use phylogenetic analysis for optimizing HIV prevention strategies should also include cost-effectiveness so that the most cost-effective prevention method is identified.
Collapse
|
18
|
Metzig C, Ratmann O, Bezemer D, Colijn C. Phylogenies from dynamic networks. PLoS Comput Biol 2019; 15:e1006761. [PMID: 30807578 PMCID: PMC6420041 DOI: 10.1371/journal.pcbi.1006761] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 03/15/2019] [Accepted: 01/07/2019] [Indexed: 12/12/2022] Open
Abstract
The relationship between the underlying contact network over which a pathogen spreads and the pathogen phylogenetic trees that are obtained presents an opportunity to use sequence data to learn about contact networks that are difficult to study empirically. However, this relationship is not explicitly known and is usually studied in simulations, often with the simplifying assumption that the contact network is static in time, though human contact networks are dynamic. We simulate pathogen phylogenetic trees on dynamic Erdős-Renyi random networks and on two dynamic networks with skewed degree distribution, of which one is additionally clustered. We use tree shape features to explore how adding dynamics changes the relationships between the overall network structure and phylogenies. Our tree features include the number of small substructures (cherries, pitchforks) in the trees, measures of tree imbalance (Sackin index, Colless index), features derived from network science (diameter, closeness), as well as features using the internal branch lengths from the tip to the root. Using principal component analysis we find that the network dynamics influence the shapes of phylogenies, as does the network type. We also compare dynamic and time-integrated static networks. We find, in particular, that static network models like the widely used Barabasi-Albert model can be poor approximations for dynamic networks. We explore the effects of mis-specifying the network on the performance of classifiers trained identify the transmission rate (using supervised learning methods). We find that both mis-specification of the underlying network and its parameters (mean degree, turnover rate) have a strong adverse effect on the ability to estimate the transmission parameter. We illustrate these results by classifying HIV trees with a classifier that we trained on simulated trees from different networks, infection rates and turnover rates. Our results point to the importance of correctly estimating and modelling contact networks with dynamics when using phylodynamic tools to estimate epidemiological parameters.
Collapse
Affiliation(s)
- Cornelia Metzig
- Dept of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom
| | - Oliver Ratmann
- Dept of Mathematics, Imperial College London, London, United Kingdom
| | | | - Caroline Colijn
- Dept of Mathematics, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
19
|
Houldcroft CJ, Roy S, Morfopoulou S, Margetts BK, Depledge DP, Cudini J, Shah D, Brown JR, Romero EY, Williams R, Cloutman-Green E, Rao K, Standing JF, Hartley JC, Breuer J. Use of Whole-Genome Sequencing of Adenovirus in Immunocompromised Pediatric Patients to Identify Nosocomial Transmission and Mixed-Genotype Infection. J Infect Dis 2018; 218:1261-1271. [PMID: 29917114 DOI: 10.1093/infdis/jiy323] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/26/2018] [Indexed: 01/26/2023] Open
Abstract
Background Adenoviruses are significant pathogens for the immunocompromised, arising from primary infection or reinfection. Serotyping is insufficient to support nosocomial transmission investigations. We investigate whether whole-genome sequencing (WGS) provides clinically relevant information on transmission among patients in a pediatric tertiary hospital. Methods We developed a target-enriched adenovirus WGS technique for clinical samples and retrospectively sequenced 107 adenovirus-positive residual diagnostic samples, including viremias (>5 × 104 copies/mL), from 37 patients collected January 2011-March 2016. Whole-genome sequencing was used to determine genotype and for phylogenetic analysis. Results Adenovirus sequences were recovered from 105 of 107 samples. Full genome sequences were recovered from all 20 nonspecies C samples and from 36 of 85 species C viruses, with partial genome sequences recovered from the rest. Whole-genome phylogenetic analysis suggested linkage of 3 genotype A31 cases and uncovered an unsuspected epidemiological link to an A31 infection first detected on the same ward 4 years earlier. In 9 samples from 1 patient who died, we identified a mixed genotype adenovirus infection. Conclusions Adenovirus WGS from clinical samples is possible and useful for genotyping and molecular epidemiology. Whole-genome sequencing identified likely nosocomial transmission with greater resolution than conventional genotyping and distinguished between adenovirus disease due to single or multiple genotypes.
Collapse
Affiliation(s)
- Charlotte J Houldcroft
- Infection, Immunity and Inflammation Section, UCL Great Ormond Street Institute of Child Health, University College London, United Kingdom.,Division of Infection and Immunity, University College London, United Kingdom
| | - Sunando Roy
- Division of Infection and Immunity, University College London, United Kingdom
| | - Sofia Morfopoulou
- Division of Infection and Immunity, University College London, United Kingdom
| | - Ben K Margetts
- Infection, Immunity and Inflammation Section, UCL Great Ormond Street Institute of Child Health, University College London, United Kingdom.,Centre for Computation, Mathematics and Physics in the Life Sciences and Experimental Biology, University College London, United Kingdom.,Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Daniel P Depledge
- Infection, Immunity and Inflammation Section, UCL Great Ormond Street Institute of Child Health, University College London, United Kingdom
| | - Juliana Cudini
- Division of Infection and Immunity, University College London, United Kingdom
| | - Divya Shah
- Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Julianne R Brown
- Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Erika Yara Romero
- Division of Infection and Immunity, University College London, United Kingdom
| | - Rachel Williams
- Division of Infection and Immunity, University College London, United Kingdom
| | - Elaine Cloutman-Green
- Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Kanchan Rao
- Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Joseph F Standing
- Infection, Immunity and Inflammation Section, UCL Great Ormond Street Institute of Child Health, University College London, United Kingdom.,Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - John C Hartley
- Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| | - Judith Breuer
- Infection, Immunity and Inflammation Section, UCL Great Ormond Street Institute of Child Health, University College London, United Kingdom.,Division of Infection and Immunity, University College London, United Kingdom.,Great Ormond Street Hospital for Children NHS Foundation Trust, London, United Kingdom
| |
Collapse
|
20
|
Theys K, Libin P, Pineda-Peña AC, Nowé A, Vandamme AM, Abecasis AB. The impact of HIV-1 within-host evolution on transmission dynamics. Curr Opin Virol 2017; 28:92-101. [PMID: 29275182 DOI: 10.1016/j.coviro.2017.12.001] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 11/23/2017] [Accepted: 12/03/2017] [Indexed: 11/17/2022]
Abstract
The adaptive potential of HIV-1 is a vital mechanism to evade host immune responses and antiviral treatment. However, high evolutionary rates during persistent infection can impair transmission efficiency and alter disease progression in the new host, resulting in a delicate trade-off between within-host virulence and between-host infectiousness. This trade-off is visible in the disparity in evolutionary rates at within-host and between-host levels, and preferential transmission of ancestral donor viruses. Understanding the impact of within-host evolution for epidemiological studies is essential for the design of preventive and therapeutic measures. Herein, we review recent theoretical and experimental work that generated new insights into the complex link between within-host evolution and between-host fitness, revealing temporal and selective processes underlying the structure and dynamics of HIV-1 transmission.
Collapse
Affiliation(s)
- Kristof Theys
- KU Leuven, University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Pieter Libin
- KU Leuven, University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium; Articial Intelligence Lab, Department of Computer Science, Vrije Universiteit Brussel, Brussels, Belgium
| | - Andrea-Clemencia Pineda-Peña
- Molecular Biology and Immunology Department, Fundacion Instituto de Immunologia de Colombia (FIDIC), Basic Sciences Department, Universidad del Rosario, Bogota, Colombia; Global Health and Tropical Medicine, GHTM, Institute for Hygiene and Tropical Medicine, IHMT, University Nova de Lisboa, UNL, Lisbon, Portugal
| | - Ann Nowé
- Articial Intelligence Lab, Department of Computer Science, Vrije Universiteit Brussel, Brussels, Belgium
| | - Anne-Mieke Vandamme
- KU Leuven, University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium
| | - Ana B Abecasis
- KU Leuven, University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium; Global Health and Tropical Medicine, GHTM, Institute for Hygiene and Tropical Medicine, IHMT, University Nova de Lisboa, UNL, Lisbon, Portugal
| |
Collapse
|