1
|
Kupperman MD, Ke R, Leitner T. SEEPS: A Simulation Tool for Understanding Impacts of Contact Tracing on Epidemiological Inference from Phylogenetic Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.30.567148. [PMID: 38076930 PMCID: PMC10705478 DOI: 10.1101/2023.11.30.567148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Motivation Robust sampling methods are foundational to inferences using phylogenies. Yet the impact of using contact tracing, a type of non-uniform sampling used in public health applications such as infectious disease outbreak investigations, is not well understood. To understand how this non-uniform sampling method influences a recovered phylogeny, a new simulation tool is needed. Results We developed a new simulation tool called SEEPS (Sequence Evolution and Epidemiological Process Simulator) that allows for the simulation of contact tracing and the resulting transmission tree, pathogen phylogeny, and corresponding virus genetic sequences. Importantly, SEEPS takes within-host evolution into account when generating pathogen phylogenies and sequences from transmission histories. Using SEEPS, we demonstrate that contact tracing can significantly impact the structure of the resulting tree, as described by popular tree statistics. We also examined real data from a 2007-2008 Swedish HIV-1 outbreak and the broader 1998-2010 European HIV-1 epidemic to highlight the differences in contact tracing and expected phylogenies. Aided by SEEPS, we show that the data collection of the Swedish outbreak was strongly influenced by contact tracing even after downsampling, while the broader European Union epidemic showed little evidence of universal contact tracing, agreeing with the known epidemiological information about sampling and spread. Overall, our results highlight the importance of including possible non-uniform sampling schemes when examining phylogenetic trees. For that, SEEPS serves as a useful tool to evaluate such impacts, thereby facilitating better phylogenetic inferences of the characteristics of a disease outbreak. Availability SEEPS is available at github.com/MolEvolEpid/SEEPS.
Collapse
Affiliation(s)
- Michael D. Kupperman
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, New Mexico, United States of America
- Department of Applied Mathematics, University of Washington, Washington, United States of America
| | - Ruian Ke
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, New Mexico, United States of America
| | - Thomas Leitner
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, New Mexico, United States of America
| |
Collapse
|
2
|
Thompson A, Liebeskind BJ, Scully EJ, Landis MJ. Deep Learning and Likelihood Approaches for Viral Phylogeography Converge on the Same Answers Whether the Inference Model Is Right or Wrong. Syst Biol 2024; 73:183-206. [PMID: 38189575 DOI: 10.1093/sysbio/syad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 11/22/2023] [Accepted: 01/05/2024] [Indexed: 01/09/2024] Open
Abstract
Analysis of phylogenetic trees has become an essential tool in epidemiology. Likelihood-based methods fit models to phylogenies to draw inferences about the phylodynamics and history of viral transmission. However, these methods are often computationally expensive, which limits the complexity and realism of phylodynamic models and makes them ill-suited for informing policy decisions in real-time during rapidly developing outbreaks. Likelihood-free methods using deep learning are pushing the boundaries of inference beyond these constraints. In this paper, we extend, compare, and contrast a recently developed deep learning method for likelihood-free inference from trees. We trained multiple deep neural networks using phylogenies from simulated outbreaks that spread among 5 locations and found they achieve close to the same levels of accuracy as Bayesian inference under the true simulation model. We compared robustness to model misspecification of a trained neural network to that of a Bayesian method. We found that both models had comparable performance, converging on similar biases. We also implemented a method of uncertainty quantification called conformalized quantile regression that we demonstrate has similar patterns of sensitivity to model misspecification as Bayesian highest posterior density (HPD) and greatly overlap with HPDs, but have lower precision (more conservative). Finally, we trained and tested a neural network against phylogeographic data from a recent study of the SARS-Cov-2 pandemic in Europe and obtained similar estimates of region-specific epidemiological parameters and the location of the common ancestor in Europe. Along with being as accurate and robust as likelihood-based methods, our trained neural networks are on average over 3 orders of magnitude faster after training. Our results support the notion that neural networks can be trained with simulated data to accurately mimic the good and bad statistical properties of the likelihood functions of generative phylogenetic models.
Collapse
Affiliation(s)
- Ammon Thompson
- Participant in an Education Program Sponsored by U.S. Department of Defense (DOD) at the National Geospatial-Intelligence Agency, Springfield, VA 22150, USA
| | | | - Erik J Scully
- National Geospatial-Intelligence Agency, Springfield, VA 22150, USA
| | - Michael J Landis
- Department of Biology, Washington University in St. Louis, Rebstock Hall, St. Louis, MO 63130, USA
| |
Collapse
|
3
|
Cummins B, Johnson K, Schneider JA, Del Vecchio N, Moshiri N, Wertheim JO, Goyal R, Skaathun B. Leveraging social networks for identification of people with HIV who are virally unsuppressed. AIDS 2024; 38:245-254. [PMID: 37890471 PMCID: PMC10843229 DOI: 10.1097/qad.0000000000003767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 10/06/2023] [Accepted: 10/12/2023] [Indexed: 10/29/2023]
Abstract
OBJECTIVES This study investigates primary peer-referral engagement (PRE) strategies to assess which strategy results in engaging higher numbers of people with HIV (PWH) who are virally unsuppressed. DESIGN We develop a modeling study that simulates an HIV epidemic (transmission, disease progression, and viral evolution) over 6 years using an agent-based model followed by simulating PRE strategies. We investigate two PRE strategies where referrals are based on social network strategies (SNS) or sexual partner contact tracing (SPCT). METHODS We parameterize, calibrate, and validate our study using data from Chicago on Black sexual minority men to assess these strategies for a population with high incidence and prevalence of HIV. For each strategy, we calculate the number of PWH recruited who are undiagnosed or out-of-care (OoC) and the number of direct or indirect transmissions. RESULTS SNS and SPCT identified 256.5 [95% confidence interval (CI) 234-279] and 15 (95% CI 7-27) PWH, respectively. Of these, SNS identified 159 (95% CI 142-177) PWH OoC and 32 (95% CI 21-43) PWH undiagnosed compared with 9 (95% CI 3-18) and 2 (95% CI 0-5) for SPCT. SNS identified 15.5 (95% CI 6-25) and 7.5 (95% CI 2-11) indirect and direct transmission pairs, whereas SPCT identified 6 (95% CI 0-8) and 5 (95% CI 0-8), respectively. CONCLUSION With no testing constraints, SNS is the more effective strategy to identify undiagnosed and OoC PWH. Neither strategy is successful at identifying sufficient indirect or direct transmission pairs to investigate transmission networks.
Collapse
Affiliation(s)
- Breschine Cummins
- Department of Mathematical Sciences, Montana State University, Bozeman, MT
| | - Kara Johnson
- Department of Mathematical Sciences, Montana State University, Bozeman, MT
| | - John A. Schneider
- Department of Medicine, University of Chicago
- Department of Public Health Sciences, University of Chicago, Chicago, IL
| | | | | | - Joel O. Wertheim
- Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Ravi Goyal
- Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Britt Skaathun
- Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
4
|
Matteson NL, Hassler GW, Kurzban E, Schwab MA, Perkins SA, Gangavarapu K, Levy JI, Parker E, Pride D, Hakim A, De Hoff P, Cheung W, Castro-Martinez A, Rivera A, Veder A, Rivera A, Wauer C, Holmes J, Wilson J, Ngo SN, Plascencia A, Lawrence ES, Smoot EW, Eisner ER, Tsai R, Chacón M, Baer NA, Seaver P, Salido RA, Aigner S, Ngo TT, Barber T, Ostrander T, Fielding-Miller R, Simmons EH, Zazueta OE, Serafin-Higuera I, Sanchez-Alavez M, Moreno-Camacho JL, García-Gil A, Murphy Schafer AR, McDonald E, Corrigan J, Malone JD, Stous S, Shah S, Moshiri N, Weiss A, Anderson C, Aceves CM, Spencer EG, Hufbauer EC, Lee JJ, King AJ, Ramesh KS, Nguyen KN, Saucedo K, Robles-Sikisaka R, Fisch KM, Gonias SL, Birmingham A, McDonald D, Karthikeyan S, Martin NK, Schooley RT, Negrete AJ, Reyna HJ, Chavez JR, Garcia ML, Cornejo-Bravo JM, Becker D, Isaksson M, Washington NL, Lee W, Garfein RS, Luna-Ruiz Esparza MA, Alcántar-Fernández J, Henson B, Jepsen K, Olivares-Flores B, Barrera-Badillo G, Lopez-Martínez I, Ramírez-González JE, Flores-León R, Kingsmore SF, Sanders A, Pradenas A, White B, Matthews G, Hale M, McLawhon RW, Reed SL, Winbush T, McHardy IH, Fielding RA, Nicholson L, Quigley MM, Harding A, Mendoza A, Bakhtar O, Browne SH, Olivas Flores J, Rincon Rodríguez DG, Gonzalez Ibarra M, Robles Ibarra LC, Arellano Vera BJ, Gonzalez Garcia J, Harvey-Vera A, Knight R, Laurent LC, Yeo GW, Wertheim JO, Ji X, Worobey M, Suchard MA, Andersen KG, Campos-Romero A, Wohl S, Zeller M. Genomic surveillance reveals dynamic shifts in the connectivity of COVID-19 epidemics. Cell 2023; 186:5690-5704.e20. [PMID: 38101407 PMCID: PMC10795731 DOI: 10.1016/j.cell.2023.11.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 08/21/2023] [Accepted: 11/21/2023] [Indexed: 12/17/2023]
Abstract
The maturation of genomic surveillance in the past decade has enabled tracking of the emergence and spread of epidemics at an unprecedented level. During the COVID-19 pandemic, for example, genomic data revealed that local epidemics varied considerably in the frequency of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) lineage importation and persistence, likely due to a combination of COVID-19 restrictions and changing connectivity. Here, we show that local COVID-19 epidemics are driven by regional transmission, including across international boundaries, but can become increasingly connected to distant locations following the relaxation of public health interventions. By integrating genomic, mobility, and epidemiological data, we find abundant transmission occurring between both adjacent and distant locations, supported by dynamic mobility patterns. We find that changing connectivity significantly influences local COVID-19 incidence. Our findings demonstrate a complex meaning of "local" when investigating connected epidemics and emphasize the importance of collaborative interventions for pandemic prevention and mitigation.
Collapse
Affiliation(s)
| | - Gabriel W Hassler
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ezra Kurzban
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Madison A Schwab
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Sarah A Perkins
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Karthik Gangavarapu
- Department of Biomathematics, David Geffen School of Medicine at UCLA, University of California, Los Angeles, Los Angeles, CA, USA; Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Joshua I Levy
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Edyth Parker
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - David Pride
- Department of Pathology, University of California, San Diego, La Jolla, CA, USA; Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Abbas Hakim
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA; Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Diego, La Jolla, CA, USA; COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Peter De Hoff
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA; Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Diego, La Jolla, CA, USA; COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Willi Cheung
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA; Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Diego, La Jolla, CA, USA; COVID-19 Detection, Investigation, Surveillance, Clinical, and Outbreak Response, California Department of Public Health, Richmond, CA, USA
| | - Anelizze Castro-Martinez
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA; Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Diego, La Jolla, CA, USA; Sanford Consortium of Regenerative Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Andrea Rivera
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Anthony Veder
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Ariana Rivera
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Cassandra Wauer
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Jacqueline Holmes
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Jedediah Wilson
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Shayla N Ngo
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Ashley Plascencia
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Elijah S Lawrence
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Elizabeth W Smoot
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Emily R Eisner
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Rebecca Tsai
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Marisol Chacón
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Nathan A Baer
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Phoebe Seaver
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Rodolfo A Salido
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Stefan Aigner
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Toan T Ngo
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Tom Barber
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Tyler Ostrander
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Rebecca Fielding-Miller
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, La Jolla, CA, USA; Division of Infectious Disease and Global Public Health, University of California, San Diego, La Jolla, CA, USA
| | | | - Oscar E Zazueta
- Department of Epidemiology, Secretaria de Salud de Baja California, Tijuana, Baja California, Mexico
| | | | - Manuel Sanchez-Alavez
- Centro de Diagnostico COVID-19 UABC, Tijuana, Baja California, Mexico; Department of Molecular Medicine, Scripps Research, La Jolla, CA, USA
| | | | - Abraham García-Gil
- Clinical Laboratory Department, Salud Digna, A.C, Tijuana, Baja California, Mexico
| | | | - Eric McDonald
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Jeremy Corrigan
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - John D Malone
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Sarah Stous
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Seema Shah
- County of San Diego Health and Human Services Agency, San Diego, CA, USA
| | - Niema Moshiri
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA
| | - Alana Weiss
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Catelyn Anderson
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Christine M Aceves
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Emily G Spencer
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Emory C Hufbauer
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Justin J Lee
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Alison J King
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Karthik S Ramesh
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Kelly N Nguyen
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Kieran Saucedo
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | | | - Kathleen M Fisch
- Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Diego, La Jolla, CA, USA; Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, CA, USA
| | - Steven L Gonias
- Department of Pathology, University of California, San Diego, La Jolla, CA, USA
| | - Amanda Birmingham
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Daniel McDonald
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Smruthi Karthikeyan
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Natasha K Martin
- Division of Infectious Disease and Global Public Health, University of California, San Diego, La Jolla, CA, USA
| | - Robert T Schooley
- Division of Infectious Disease and Global Public Health, University of California, San Diego, La Jolla, CA, USA
| | - Agustin J Negrete
- Facultad de Ciencias de la Salud Universidad Autonoma de Baja California Valle de Las Palmas, Tijuana, Baja California, Mexico
| | - Horacio J Reyna
- Facultad de Ciencias de la Salud Universidad Autonoma de Baja California Valle de Las Palmas, Tijuana, Baja California, Mexico
| | - Jose R Chavez
- Facultad de Ciencias de la Salud Universidad Autonoma de Baja California Valle de Las Palmas, Tijuana, Baja California, Mexico
| | - Maria L Garcia
- Facultad de Ciencias de la Salud Universidad Autonoma de Baja California Valle de Las Palmas, Tijuana, Baja California, Mexico
| | - Jose M Cornejo-Bravo
- Facultad de Ciencias Quimicas e Ingenieria, Universidad Autonoma de Baja California, Tijuana, Baja California, Mexico
| | | | | | | | | | - Richard S Garfein
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, La Jolla, CA, USA
| | | | | | - Benjamin Henson
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Kristen Jepsen
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Beatriz Olivares-Flores
- Instituto de Diagnóstico y Referencia Epidemiológicos (InDRE), Ciudad de México, CDMX, Mexico
| | - Gisela Barrera-Badillo
- Instituto de Diagnóstico y Referencia Epidemiológicos (InDRE), Ciudad de México, CDMX, Mexico
| | - Irma Lopez-Martínez
- Instituto de Diagnóstico y Referencia Epidemiológicos (InDRE), Ciudad de México, CDMX, Mexico
| | - José E Ramírez-González
- Instituto de Diagnóstico y Referencia Epidemiológicos (InDRE), Ciudad de México, CDMX, Mexico
| | - Rita Flores-León
- Instituto de Diagnóstico y Referencia Epidemiológicos (InDRE), Ciudad de México, CDMX, Mexico
| | | | - Alison Sanders
- Return to Learn, University of California, San Diego, La Jolla, CA, USA
| | - Allorah Pradenas
- Return to Learn, University of California, San Diego, La Jolla, CA, USA
| | - Benjamin White
- Return to Learn, University of California, San Diego, La Jolla, CA, USA
| | - Gary Matthews
- Return to Learn, University of California, San Diego, La Jolla, CA, USA
| | - Matt Hale
- Return to Learn, University of California, San Diego, La Jolla, CA, USA
| | - Ronald W McLawhon
- Return to Learn, University of California, San Diego, La Jolla, CA, USA
| | - Sharon L Reed
- Return to Learn, University of California, San Diego, La Jolla, CA, USA
| | - Terri Winbush
- Return to Learn, University of California, San Diego, La Jolla, CA, USA
| | | | | | | | | | | | | | | | - Sara H Browne
- Division of Infectious Disease and Global Public Health, University of California, San Diego, La Jolla, CA, USA; Specialist in Global Health, Encinitas, CA, USA
| | - Jocelyn Olivas Flores
- Facultad de Ciencias Quimicas e Ingenieria, Universidad Autonoma de Baja California, Tijuana, Baja California, Mexico; University of HealthMx, Tijuana, Baja California, Mexico
| | - Diana G Rincon Rodríguez
- University of HealthMx, Tijuana, Baja California, Mexico; Facultad de Medicina, Universidad Xochicalco, Tijuana, Baja California, Mexico
| | - Martin Gonzalez Ibarra
- University of HealthMx, Tijuana, Baja California, Mexico; Facultad de Medicina, Universidad Xochicalco, Tijuana, Baja California, Mexico
| | - Luis C Robles Ibarra
- University of HealthMx, Tijuana, Baja California, Mexico; Instituto de Seguridad y Servicios Sociales de los Trabajadores del Estado, Tijuana, Baja California, Mexico
| | - Betsy J Arellano Vera
- University of HealthMx, Tijuana, Baja California, Mexico; Instituto Mexicano del Seguro Social, Tijuana, Baja California, Mexico
| | - Jonathan Gonzalez Garcia
- University of HealthMx, Tijuana, Baja California, Mexico; SIMNSA, Tijuana, Baja California, Mexico
| | | | - Rob Knight
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA; Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA; Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Louise C Laurent
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA; Department of Obstetrics, Gynecology, and Reproductive Sciences, University of California, San Diego, La Jolla, CA, USA; Sanford Consortium of Regenerative Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Gene W Yeo
- Expedited COVID Identification Environment (EXCITE) Laboratory, Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA; Sanford Consortium of Regenerative Medicine, University of California, San Diego, La Jolla, CA, USA; Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Joel O Wertheim
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Xiang Ji
- Department of Mathematics, School of Science and Engineering, Tulane University, New Orleans, LA, USA
| | - Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, USA
| | - Marc A Suchard
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kristian G Andersen
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA.
| | - Abraham Campos-Romero
- Innovation and Research Department, Salud Digna, A.C, Tijuana, Baja California, Mexico
| | - Shirlee Wohl
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA
| | - Mark Zeller
- Department of Immunology and Microbiology, Scripps Research, La Jolla, CA, USA.
| |
Collapse
|
5
|
Goyal R, Carnegie N, Slipher S, Turk P, Little SJ, De Gruttola V. Estimating contact network properties by integrating multiple data sources associated with infectious diseases. Stat Med 2023; 42:3593-3615. [PMID: 37392149 PMCID: PMC10825904 DOI: 10.1002/sim.9816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 05/09/2023] [Accepted: 05/19/2023] [Indexed: 07/03/2023]
Abstract
To effectively mitigate the spread of communicable diseases, it is necessary to understand the interactions that enable disease transmission among individuals in a population; we refer to the set of these interactions as a contact network. The structure of the contact network can have profound effects on both the spread of infectious diseases and the effectiveness of control programs. Therefore, understanding the contact network permits more efficient use of resources. Measuring the structure of the network, however, is a challenging problem. We present a Bayesian approach to integrate multiple data sources associated with the transmission of infectious diseases to more precisely and accurately estimate important properties of the contact network. An important aspect of the approach is the use of the congruence class models for networks. We conduct simulation studies modeling pathogens resembling SARS-CoV-2 and HIV to assess the method; subsequently, we apply our approach to HIV data from the University of California San Diego Primary Infection Resource Consortium. Based on simulation studies, we demonstrate that the integration of epidemiological and viral genetic data with risk behavior survey data can lead to large decreases in mean squared error (MSE) in contact network estimates compared to estimates based strictly on risk behavior information. This decrease in MSE is present even in settings where the risk behavior surveys contain measurement error. Through these simulations, we also highlight certain settings where the approach does not improve MSE.
Collapse
Affiliation(s)
- Ravi Goyal
- Division of Infectious Diseases and Global Public, University of California San Diego, San Diego, California, USA
| | | | - Sally Slipher
- Department of Mathematical Sciences, Montana State University, Bozeman, Montana, USA
| | - Philip Turk
- Department of Data Science, University of Mississippi Medical Center, Jackson, Mississippi, USA
| | - Susan J Little
- Division of Infectious Diseases and Global Public, University of California San Diego, La Jolla, California, USA
| | - Victor De Gruttola
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
6
|
Ke Z, Vikalo H. Graph-Based Reconstruction and Analysis of Disease Transmission Networks Using Viral Genomic Data. J Comput Biol 2023. [PMID: 37347892 DOI: 10.1089/cmb.2022.0373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/24/2023] Open
Abstract
Understanding the patterns of viral disease transmissions helps establish public health policies and aids in controlling and ending a disease outbreak. Classical methods for studying disease transmission dynamics that rely on epidemiological data, such as times of sample collection and duration of exposure intervals, struggle to provide desired insight due to limited informativeness of such data. A more precise characterization of disease transmissions may be acquired from sequencing data that reveal genetic distance between viral genomes in patient samples. Indeed, genetic distance between viral strains present in hosts contains valuable information about transmission history, thus motivating the design of methods that rely on genomic data to reconstruct a directed disease transmission network, detect transmission clusters, and identify significant network nodes (e.g., super-spreaders). In this article, we present a novel end-to-end framework for the analysis of viral transmissions utilizing viral genomic (sequencing) data. The proposed framework groups infected hosts into transmission clusters based on the reconstructed viral strains infecting them; the genetic distance between a pair of hosts is calculated using Earth Mover's Distance, and further used to infer transmission direction between the hosts. To quantify the significance of a host in the transmission network, the importance score is calculated by a graph convolutional autoencoder. The viral transmission network is represented by a directed minimum spanning tree utilizing the Edmond's algorithm modified to incorporate constraints on the importance scores of the hosts. The proposed framework outperforms state-of-the-art techniques for the analysis of viral transmission dynamics in several experiments on semiexperimental as well as experimental data.
Collapse
Affiliation(s)
- Ziqi Ke
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas, USA
| | - Haris Vikalo
- Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas, USA
| |
Collapse
|
7
|
Freire B, Ladra S, Parama JR, Salmela L. ViQUF: De Novo Viral Quasispecies Reconstruction Using Unitig-Based Flow Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1550-1562. [PMID: 35853050 DOI: 10.1109/tcbb.2022.3190282] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
During viral infection, intrahost mutation and recombination can lead to significant evolution, resulting in a population of viruses that harbor multiple haplotypes. The task of reconstructing these haplotypes from short-read sequencing data is called viral quasispecies assembly, and it can be categorized as a multiassembly problem. We consider the de novo version of the problem, where no reference is available. We present ViQUF, a de novo viral quasispecies assembler that addresses haplotype assembly and quantification. ViQUF obtains a first draft of the assembly graph from a de Bruijn graph. Then, solving a min-cost flow over a flow network built for each pair of adjacent vertices based on their paired-end information creates an approximate paired assembly graph with suggested frequency values as edge labels, which is the first frequency estimation. Then, original haplotypes are obtained through a greedy path reconstruction guided by a min-cost flow solution in the approximate paired assembly graph. ViQUF outputs the contigs with their frequency estimations. Results on real and simulated data show that ViQUF is at least four times faster using at most half of the memory than previous methods, while maintaining, and in some cases outperforming, the high quality of assembly and frequency estimation of overlap graph-based methodologies, which are known to be more accurate but slower than the de Bruijn graph-based approaches.
Collapse
|
8
|
Zhang C, Bzikadze AV, Safonova Y, Mirarab S. A scalable model for simulating multi-round antibody evolution and benchmarking of clonal tree reconstruction methods. Front Immunol 2022; 13:1014439. [PMID: 36618367 PMCID: PMC9815712 DOI: 10.3389/fimmu.2022.1014439] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/26/2022] [Indexed: 12/12/2022] Open
Abstract
Affinity maturation (AM) of B cells through somatic hypermutations (SHMs) enables the immune system to evolve to recognize diverse pathogens. The accumulation of SHMs leads to the formation of clonal lineages of antibody-secreting b cells that have evolved from a common naïve B cell. Advances in high-throughput sequencing have enabled deep scans of B cell receptor repertoires, paving the way for reconstructing clonal trees. However, it is not clear if clonal trees, which capture microevolutionary time scales, can be reconstructed using traditional phylogenetic reconstruction methods with adequate accuracy. In fact, several clonal tree reconstruction methods have been developed to fix supposed shortcomings of phylogenetic methods. Nevertheless, no consensus has been reached regarding the relative accuracy of these methods, partially because evaluation is challenging. Benchmarking the performance of existing methods and developing better methods would both benefit from realistic models of clonal lineage evolution specifically designed for emulating B cell evolution. In this paper, we propose a model for modeling B cell clonal lineage evolution and use this model to benchmark several existing clonal tree reconstruction methods. Our model, designed to be extensible, has several features: by evolving the clonal tree and sequences simultaneously, it allows modeling selective pressure due to changes in affinity binding; it enables scalable simulations of large numbers of cells; it enables several rounds of infection by an evolving pathogen; and, it models building of memory. In addition, we also suggest a set of metrics for comparing clonal trees and measuring their properties. Our results show that while maximum likelihood phylogenetic reconstruction methods can fail to capture key features of clonal tree expansion if applied naively, a simple post-processing of their results, where short branches are contracted, leads to inferences that are better than alternative methods.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, University of California, San Diego, San Diego, CA, United States
| | - Andrey V. Bzikadze
- Bioinformatics and Systems Biology, University of California, San Diego, San Diego, CA, United States
| | - Yana Safonova
- Computer Science and Engineering Department, University of California, San Diego, San Diego, CA, United States
| | - Siavash Mirarab
- Electrical and Computer Engineering Department, University of California, San Diego, San Diego, CA, United States,*Correspondence: Siavash Mirarab,
| |
Collapse
|
9
|
Optimized phylogenetic clustering of HIV-1 sequence data for public health applications. PLoS Comput Biol 2022; 18:e1010745. [PMID: 36449514 PMCID: PMC9744331 DOI: 10.1371/journal.pcbi.1010745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 12/12/2022] [Accepted: 11/17/2022] [Indexed: 12/02/2022] Open
Abstract
Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007-0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 - 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.
Collapse
|
10
|
Skums P, Mohebbi F, Tsyvina V, Baykal PI, Nemira A, Ramachandran S, Khudyakov Y. SOPHIE: Viral outbreak investigation and transmission history reconstruction in a joint phylogenetic and network theory framework. Cell Syst 2022; 13:844-856.e4. [PMID: 36265470 PMCID: PMC9590096 DOI: 10.1016/j.cels.2022.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 07/05/2022] [Accepted: 07/19/2022] [Indexed: 01/26/2023]
Abstract
Genomic epidemiology is now widely used for viral outbreak investigations. Still, this methodology faces many challenges. First, few methods account for intra-host viral diversity. Second, maximum parsimony principle continues to be employed for phylogenetic inference of transmission histories, even though maximum likelihood or Bayesian models are usually more consistent. Third, many methods utilize case-specific data, such as sampling times or infection exposure intervals. This impedes study of persistent infections in vulnerable groups, where such information has a limited use. Finally, most methods implicitly assume that transmission events are independent, although common source outbreaks violate this assumption. We propose a maximum likelihood framework, SOPHIE, based on the integration of phylogenetic and random graph models. It infers transmission networks from viral phylogenies and expected properties of inter-host social networks modeled as random graphs with given expected degree distributions. SOPHIE is scalable, accounts for intra-host diversity, and accurately infers transmissions without case-specific epidemiological data.
Collapse
Affiliation(s)
- Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, USA.
| | - Fatemeh Mohebbi
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Vyacheslav Tsyvina
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Pelin Icer Baykal
- Department of Biosystems Science & Engineering, ETH Zurich, Basel, Switzerland
| | - Alina Nemira
- Department of Computer Science, Georgia State University, Atlanta, GA, USA
| | - Sumathi Ramachandran
- Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Yury Khudyakov
- Division of Viral Hepatitis, Centers for Disease Control and Prevention, Atlanta, GA, USA
| |
Collapse
|
11
|
Pekar JE, Magee A, Parker E, Moshiri N, Izhikevich K, Havens JL, Gangavarapu K, Malpica Serrano LM, Crits-Christoph A, Matteson NL, Zeller M, Levy JI, Wang JC, Hughes S, Lee J, Park H, Park MS, Ching KZY, Lin RTP, Mat Isa MN, Noor YM, Vasylyeva TI, Garry RF, Holmes EC, Rambaut A, Suchard MA, Andersen KG, Worobey M, Wertheim JO. The molecular epidemiology of multiple zoonotic origins of SARS-CoV-2. Science 2022; 377:960-966. [PMID: 35881005 PMCID: PMC9348752 DOI: 10.1126/science.abp8337] [Citation(s) in RCA: 73] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 07/18/2022] [Indexed: 01/08/2023]
Abstract
Understanding the circumstances that lead to pandemics is important for their prevention. We analyzed the genomic diversity of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) early in the coronavirus disease 2019 (COVID-19) pandemic. We show that SARS-CoV-2 genomic diversity before February 2020 likely comprised only two distinct viral lineages, denoted "A" and "B." Phylodynamic rooting methods, coupled with epidemic simulations, reveal that these lineages were the result of at least two separate cross-species transmission events into humans. The first zoonotic transmission likely involved lineage B viruses around 18 November 2019 (23 October to 8 December), and the separate introduction of lineage A likely occurred within weeks of this event. These findings indicate that it is unlikely that SARS-CoV-2 circulated widely in humans before November 2019 and define the narrow window between when SARS-CoV-2 first jumped into humans and when the first cases of COVID-19 were reported. As with other coronaviruses, SARS-CoV-2 emergence likely resulted from multiple zoonotic events.
Collapse
Affiliation(s)
- Jonathan E. Pekar
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| | - Andrew Magee
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Edyth Parker
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Niema Moshiri
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Katherine Izhikevich
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
- Department of Mathematics, University of California San Diego, La Jolla, CA 92093, USA
| | - Jennifer L. Havens
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Karthik Gangavarapu
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | | | - Alexander Crits-Christoph
- W. Harry Feinstone Department of Molecular Microbiology and Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland 21205, USA
| | - Nathaniel L. Matteson
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Mark Zeller
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Joshua I. Levy
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Jade C. Wang
- New York City Public Health Laboratory, New York City Department of Health and Mental Hygiene, New York, NY 11101, USA
| | - Scott Hughes
- New York City Public Health Laboratory, New York City Department of Health and Mental Hygiene, New York, NY 11101, USA
| | - Jungmin Lee
- Department of Microbiology, Institute for Viral Diseases, Biosafety Center, College of Medicine, Korea University, Seoul, South Korea
| | - Heedo Park
- Department of Microbiology, Institute for Viral Diseases, Biosafety Center, College of Medicine, Korea University, Seoul, South Korea
- BK21 Graduate Program, Department of Biomedical Sciences, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | - Man-Seong Park
- Department of Microbiology, Institute for Viral Diseases, Biosafety Center, College of Medicine, Korea University, Seoul, South Korea
- BK21 Graduate Program, Department of Biomedical Sciences, Korea University College of Medicine, Seoul, 02841, Republic of Korea
| | | | - Raymond Tzer Pin Lin
- National Public Health Laboratory, National Centre for Infectious Diseases, Singapore
| | - Mohd Noor Mat Isa
- Malaysia Genome and Vaccine Institute, Jalan Bangi, 43000 Kajang, Selangor, Malaysia
| | - Yusuf Muhammad Noor
- Malaysia Genome and Vaccine Institute, Jalan Bangi, 43000 Kajang, Selangor, Malaysia
| | - Tetyana I. Vasylyeva
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Robert F. Garry
- Tulane University, School of Medicine, Department of Microbiology and Immunology, New Orleans, LA 70112, USA
- Zalgen Labs, LCC, Frederick, MD 21703 USA
- Global Virus Network (GVN), Baltimore, MD 21201, USA
| | - Edward C. Holmes
- Sydney Institute for Infectious Diseases, School of Life and Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney, NSW 2006, Australia
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, King's Buildings, Edinburgh, EH9 3FL, UK
| | - Marc A. Suchard
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Kristian G. Andersen
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037, USA
- Scripps Research Translational Institute, La Jolla, CA 92037, USA
| | - Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Joel O. Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
12
|
Shchur V, Spirin V, Sirotkin D, Burovski E, De Maio N, Corbett-Detig R. VGsim: Scalable viral genealogy simulator for global pandemic. PLoS Comput Biol 2022; 18:e1010409. [PMID: 36001646 PMCID: PMC9447924 DOI: 10.1371/journal.pcbi.1010409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 09/06/2022] [Accepted: 07/18/2022] [Indexed: 11/24/2022] Open
Abstract
Accurate simulation of complex biological processes is an essential component of developing and validating new technologies and inference approaches. As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from most regions in the world. More than 5.5 million viral sequences are publicly available as of November 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time and space. Additionally such data are a rich source of information about molecular evolutionary processes including natural selection, for example allowing the identification of new variants with transmissibility and immunity evasion advantages. To our knowledge, there is no framework that is both efficient and flexible enough to simulate the pandemic to approximate world-scale scenarios and generate viral genealogies of millions of samples. Here, we introduce a new fast simulator VGsim which addresses the problem of simulation genealogies under epidemiological models. The simulation process is split into two phases. During the forward run the algorithm generates a chain of population-level events reflecting the dynamics of the pandemic using an hierarchical version of the Gillespie algorithm. During the backward run a coalescent-like approach generates a tree genealogy of samples conditioning on the population-level events chain generated during the forward run. Our software can model complex population structure, epistasis and immunity escape. We develop a fast and flexible simulation software package VGsim for modeling epidemiological processes and generating genealogies of large pathogen samples. The software takes into account host population structure, pathogen evolution, host immunity and some other epidemiological aspects. The computational efficiency of the package allows to simulate genealogies of tens of millions of samples, which is important, e.g., for SARS-CoV-2 genome studies.
Collapse
Affiliation(s)
- Vladimir Shchur
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
- * E-mail:
| | - Vadim Spirin
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
| | - Dmitry Sirotkin
- International laboratory of statistical and computational genomics, HSE University, Moscow, Russia
| | | | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering and Genomics Institute, UC Santa Cruz, California, United States of America
| |
Collapse
|
13
|
Miller RL, McLaughlin A, Liang RH, Harding J, Wong J, Le AQ, Brumme CJ, Montaner JSG, Joy JB. Phylogenetic prioritization of HIV-1 transmission clusters with viral lineage-level diversification rates. Evol Med Public Health 2022; 10:305-315. [PMID: 35899097 PMCID: PMC9311310 DOI: 10.1093/emph/eoac026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 07/07/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Background and objectives
Public health officials faced with a large number of transmission clusters require a rapid, scalable and unbiased way to prioritize distribution of limited resources to maximize benefits. We hypothesize that transmission cluster prioritization based on phylogenetically derived lineage-level diversification rates will perform as well as or better than commonly used growth-based prioritization measures, without need for historical data or subjective interpretation.
Methodology
9822 HIV pol sequences collected during routine drug resistance genotyping were used alongside simulated sequence data to infer sets of phylogenetic transmission clusters via patristic distance threshold. Prioritized clusters inferred from empirical data were compared to those prioritized by the current public health protocols. Prioritization of simulated clusters was evaluated based on correlation of a given prioritization measure with future cluster growth, as well as the number of direct downstream transmissions from cluster members.
Results
Empirical data suggest diversification rate-based measures perform comparably to growth-based measures in recreating public heath prioritization choices. However, unbiased simulated data reveals phylogenetic diversification rate-based measures perform better in predicting future cluster growth relative to growth-based measures, particularly long-term growth. Diversification rate-based measures also display advantages over growth-based measures in highlighting groups with greater future transmission events compared to random groups of the same size. Furthermore, diversification rate measures were notably more robust to effects of decreased sampling proportion.
Conclusions and implications
Our findings indicate diversification rate-based measures frequently outperform growth-based measures in predicting future cluster growth and offer several additional advantages beneficial to optimizing the public health prioritization process.
Collapse
Affiliation(s)
- Rachel L Miller
- Molecular Epidemiology and Evolutionary Genetics, British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada
- Bioinformatics Program, University of British Columbia, Vancouver, Canada
| | - Angela McLaughlin
- Molecular Epidemiology and Evolutionary Genetics, British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada
- Bioinformatics Program, University of British Columbia, Vancouver, Canada
| | - Richard H Liang
- Laboratory Program, British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada
| | | | - Jason Wong
- Clinical Prevention Services, British Columbia Centre for Disease Control, Vancouver, Canada
| | - Anh Q Le
- Laboratory Program, British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada
| | - Chanson J Brumme
- Laboratory Program, British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada
- Department of Medicine, University of British Columbia, Vancouver, Canada
| | - Julio S G Montaner
- Department of Medicine, University of British Columbia, Vancouver, Canada
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, Canada
| | - Jeffrey B Joy
- Corresponding author. Molecular Epidemiology and Evolutionary Genetics, BC Centre for Excellence in HIV/AIDS, 615-1033 Davie St, Vancouver, BC, V6E 1M5, Canada. Tel: +1-(604)-368-5569; E-mail:
| |
Collapse
|
14
|
Dhar S, Zhang C, Măndoiu II, Bansal MS. TNet: Transmission Network Inference Using Within-Host Strain Diversity and its Application to Geographical Tracking of COVID-19 Spread. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:230-242. [PMID: 34255632 PMCID: PMC8956368 DOI: 10.1109/tcbb.2021.3096455] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 07/03/2021] [Accepted: 07/08/2021] [Indexed: 06/13/2023]
Abstract
The inference of disease transmission networks is an important problem in epidemiology. One popular approach for building transmission networks is to reconstruct a phylogenetic tree using sequences from disease strains sampled from infected hosts and infer transmissions based on this tree. However, most existing phylogenetic approaches for transmission network inference are highly computationally intensive and cannot take within-host strain diversity into account. Here, we introduce a new phylogenetic approach for inferring transmission networks, TNet, that addresses these limitations. TNet uses multiple strain sequences from each sampled host to infer transmissions and is simpler and more accurate than existing approaches. Furthermore, TNet is highly scalable and able to distinguish between ambiguous and unambiguous transmission inferences. We evaluated TNet on a large collection of 560 simulated transmission networks of various sizes and diverse host, sequence, and transmission characteristics, as well as on 10 real transmission datasets with known transmission histories. Our results show that TNet outperforms two other recently developed methods, phyloscanner and SharpTNI, that also consider within-host strain diversity. We also applied TNet to a large collection of SARS-CoV-2 genomes sampled from infected individuals in many countries around the world, demonstrating how our inference framework can be adapted to accurately infer geographical transmission networks. TNet is freely available from https://compbio.engr.uconn.edu/software/TNet/.
Collapse
Affiliation(s)
- Saurav Dhar
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Chengchen Zhang
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Ion I. Măndoiu
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| | - Mukul S. Bansal
- Department of Computer Science & EngineeringUniversity of ConnecticutStorrsCT06269USA
| |
Collapse
|
15
|
Almaraz K, Jang T, Lewis M, Ngo T, Song M, Moshiri N. SEPIA: simulation-based evaluation of prioritization algorithms. BMC Med Inform Decis Mak 2021; 21:177. [PMID: 34082739 PMCID: PMC8173910 DOI: 10.1186/s12911-021-01536-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Accepted: 05/23/2021] [Indexed: 11/18/2022] Open
Abstract
Background The ability to prioritize people living with HIV (PLWH) by risk of future transmissions could aid public health officials in optimizing epidemiological intervention. While methods exist to perform such prioritization based on molecular data, their effectiveness and accuracy are poorly understood, and it is unclear how one can directly compare the accuracy of different methods. We introduce SEPIA (Simulation-based Evaluation of PrIoritization Algorithms), a novel simulation-based framework for determining the effectiveness of prioritization algorithms. SEPIA expands upon prior related work by defining novel metrics of effectiveness with which to compare prioritization techniques, as well as by creating a simulation-based tool with which to perform such effectiveness comparisons. Under several metrics of effectiveness that we propose, we compare two existing prioritization approaches: one phylogenetic (ProACT) and one distance-based (growth of HIV-TRACE transmission clusters). Results Using all proposed metrics, ProACT consistently slightly outperformed the transmission cluster growth approach. However, both methods consistently performed just marginally better than random, suggesting that there is significant room for improvement in prioritization tools. Conclusion We hope that, by providing ways to quantify the effectiveness of prioritization methods in simulation, SEPIA will aid researchers in developing novel risk prioritization tools for PLWH.
Collapse
Affiliation(s)
- Kimberly Almaraz
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Tyler Jang
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - McKenna Lewis
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Titan Ngo
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Miranda Song
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Niema Moshiri
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
| |
Collapse
|
16
|
Agent-based evolving network modeling: a new simulation method for modeling low prevalence infectious diseases. Health Care Manag Sci 2021; 24:623-639. [PMID: 33991293 PMCID: PMC8459606 DOI: 10.1007/s10729-021-09558-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Accepted: 02/19/2021] [Indexed: 11/09/2022]
Abstract
Agent-based network modeling (ABNM) simulates each person at the individual-level as agents of the simulation, and uses network generation algorithms to generate the network of contacts between individuals. ABNM are suitable for simulating individual-level dynamics of infectious diseases, especially for diseases such as HIV that spread through close contacts within intricate contact networks. However, as ABNM simulates a scaled-version of the full population, consisting of all infected and susceptible persons, they are computationally infeasible for studying certain questions in low prevalence diseases such as HIV. We present a new simulation technique, agent-based evolving network modeling (ABENM), which includes a new network generation algorithm, Evolving Contact Network Algorithm (ECNA), for generating scale-free networks. ABENM simulates only infected persons and their immediate contacts at the individual-level as agents of the simulation, and uses the ECNA for generating the contact structures between these individuals. All other susceptible persons are modeled using a compartmental modeling structure. Thus, ABENM has a hybrid agent-based and compartmental modeling structure. The ECNA uses concepts from graph theory for generating scale-free networks. Multiple social networks, including sexual partnership networks and needle sharing networks among injecting drug-users, are known to follow a scale-free network structure. Numerical results comparing ABENM with ABNM estimations for disease trajectories of hypothetical diseases transmitted on scale-free contact networks are promising for application to low prevalence diseases.
Collapse
|
17
|
Pekar J, Worobey M, Moshiri N, Scheffler K, Wertheim JO. Timing the SARS-CoV-2 index case in Hubei province. Science 2021; 372:412-417. [PMID: 33737402 PMCID: PMC8139421 DOI: 10.1126/science.abf8003] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 03/15/2021] [Indexed: 12/14/2022]
Abstract
Understanding when severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged is critical to evaluating our current approach to monitoring novel zoonotic pathogens and understanding the failure of early containment and mitigation efforts for COVID-19. We used a coalescent framework to combine retrospective molecular clock inference with forward epidemiological simulations to determine how long SARS-CoV-2 could have circulated before the time of the most recent common ancestor of all sequenced SARS-CoV-2 genomes. Our results define the period between mid-October and mid-November 2019 as the plausible interval when the first case of SARS-CoV-2 emerged in Hubei province, China. By characterizing the likely dynamics of the virus before it was discovered, we show that more than two-thirds of SARS-CoV-2-like zoonotic events would be self-limited, dying out without igniting a pandemic. Our findings highlight the shortcomings of zoonosis surveillance approaches for detecting highly contagious pathogens with moderate mortality rates.
Collapse
Affiliation(s)
- Jonathan Pekar
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| | - Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.
| | - Niema Moshiri
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | | | - Joel O Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
18
|
Moshiri N, Smith DM, Mirarab S. HIV Care Prioritization Using Phylogenetic Branch Length. J Acquir Immune Defic Syndr 2021; 86:626-637. [PMID: 33394616 PMCID: PMC7933099 DOI: 10.1097/qai.0000000000002612] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Accepted: 12/14/2020] [Indexed: 12/22/2022]
Abstract
BACKGROUND The structure of the HIV transmission networks can be dictated by just a few individuals. Public health intervention, such as ensuring people living with HIV adhere to antiretroviral therapy and remain virally suppressed, can help control the spread of the virus. However, such intervention requires using limited public health resource allocations. Determining which individuals are most at risk of transmitting HIV could allow public health officials to focus their limited resources on these individuals. SETTING Molecular epidemiology can help prioritize people living with HIV by patterns of transmission inferred from their sampled viral sequences. Such prioritization has been previously suggested and performed by monitoring cluster growth. In this article, we introduce Prioritization using AnCesTral edge lengths (ProACT), a phylogenetic approach for prioritizing individuals living with HIV. METHODS ProACT starts from a phylogeny inferred from sequence data and orders individuals according to their terminal branch length, breaking ties using ancestral branch lengths. We evaluated ProACT on a real data set of 926 HIV-1 subtype B pol data obtained in San Diego between 2005 and 2014 and a simulation data set modeling the same epidemic. Prioritization methods are compared by their ability to predict individuals who transmit most after the prioritization. RESULTS Across all simulation conditions and most real data sampling conditions, ProACT outperformed monitoring cluster growth for multiple metrics of prioritization efficacy. CONCLUSION The simple strategy used by ProACT improves the effectiveness of prioritization compared with state-of-the-art methods that rely on monitoring the growth of transmission clusters defined based on genetic distance.
Collapse
Affiliation(s)
- Niema Moshiri
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, 92093, USA
| | - Davey M. Smith
- Department of Medicine, University of California, San Diego, La Jolla, 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, 92093, USA
| |
Collapse
|
19
|
Angevaare J, Feng Z, Deardon R. Inference of latent event times and transmission networks in individual level infectious disease models. Spat Spatiotemporal Epidemiol 2021; 37:100410. [PMID: 33980405 DOI: 10.1016/j.sste.2021.100410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 01/20/2021] [Accepted: 01/28/2021] [Indexed: 10/22/2022]
Abstract
Transmission networks indicate who-infected-whom in epidemics. Reconstruction of transmission networks is invaluable in applying and developing effective control strategies for infectious diseases. We introduce transmission network individual level models (TN-ILMs), a competing-risk, continuous time extension to individual level model framework for infectious diseases of Deardon et al. (2010). Through simulation study using a Julia language software package, Pathogen.jl, we explore the models with respect to their ability to jointly infer latent event times, latent disease transmission networks, and the TN-ILM parameters. We find good parameter, event time, and transmission network inference, with enhanced performance for inference of transmission networks in epidemic simulations that have higher spatial signals in their infectivity kernel. Finally, an application of a TN-ILM to data from a greenhouse experiment on the spread of tomato spotted wilt virus is presented.
Collapse
Affiliation(s)
| | - Zeny Feng
- University of Guelph, Canada. https://zfeng.uoguelph.ca
| | - Rob Deardon
- University of Calgary, Canada. https://people.ucalgary.ca/~robert.deardon/
| |
Collapse
|
20
|
Pekar J, Worobey M, Moshiri N, Scheffler K, Wertheim JO. Timing the SARS-CoV-2 Index Case in Hubei Province. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.11.20.392126. [PMID: 33269353 PMCID: PMC7709179 DOI: 10.1101/2020.11.20.392126] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Understanding when SARS-CoV-2 emerged is critical to evaluating our current approach to monitoring novel zoonotic pathogens and understanding the failure of early containment and mitigation efforts for COVID-19. We employed a coalescent framework to combine retrospective molecular clock inference with forward epidemiological simulations to determine how long SARS-CoV-2 could have circulated prior to the time of the most recent common ancestor. Our results define the period between mid-October and mid-November 2019 as the plausible interval when the first case of SARS-CoV-2 emerged in Hubei province. By characterizing the likely dynamics of the virus before it was discovered, we show that over two-thirds of SARS-CoV-2-like zoonotic events would be self-limited, dying out without igniting a pandemic. Our findings highlight the shortcomings of zoonosis surveillance approaches for detecting highly contagious pathogens with moderate mortality rates.
Collapse
Affiliation(s)
- Jonathan Pekar
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| | - Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Niema Moshiri
- Department Computer Science & Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | | | - Joel O. Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
21
|
Bbosa N, Ssemwanga D, Kaleebu P. Short Communication: Choosing the Right Program for the Identification of HIV-1 Transmission Networks from Nucleotide Sequences Sampled from Different Populations. AIDS Res Hum Retroviruses 2020; 36:948-951. [PMID: 32693608 PMCID: PMC7698971 DOI: 10.1089/aid.2020.0033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
HIV-TRAnsmission Cluster Engine (HIV-TRACE) and Cluster Picker are some of the most widely used programs for identifying HIV-1 transmission networks from nucleotide sequences. However, choosing between these tools is subjective and often a matter of personal preference. Because these software use different algorithms to detect HIV-1 transmission networks, their optimal use is better suited with different sequence data sets and under different scenarios. The performance of these tools has previously been evaluated across a range of genetic distance thresholds without an assessment of the differences in the structure of networks identified. In this study, we tested both programs on the same HIV-1 pol sequence data set (n = 2,017) from three Ugandan populations to examine their performance across different risk groups and evaluate the structure of networks identified. HIV-TRACE that uses a single-linkage algorithm identified more nodes in the same networks that were connected by sparse links than Cluster Picker. This suggests that the choice of the program used for identifying networks should depend on the study aims, the characteristics of the population being investigated, dynamics of the epidemic, sampling design, and the nature of research questions being addressed for optimum results. HIV-TRACE could be more applicable with larger data sets where the aim is to identify larger clusters that represent distinct transmission chains and in more diverse populations where infection has occurred over a period of time. In contrast, Cluster Picker is applicable in situations where more closely connected clusters are expected in the studied populations.
Collapse
Affiliation(s)
- Nicholas Bbosa
- Medical Research Council/Uganda Virus Research Institute and London School of Hygiene & Tropical Medicine Uganda Research Unit, Entebbe, Uganda
- Address correspondence to: Nicholas Bbosa, PhD, Medical Research Council (MRC)/Uganda Virus Research Institute (UVRI) and London School of Hygiene & Tropical Medicine (LSHTM) Uganda Research Unit, Plot 51-59 Nakiwogo Road, P. O. Box 49, Entebbe 256, Uganda
| | - Deogratius Ssemwanga
- Medical Research Council/Uganda Virus Research Institute and London School of Hygiene & Tropical Medicine Uganda Research Unit, Entebbe, Uganda
- Uganda Virus Research Institute, Entebbe, Uganda
| | - Pontiano Kaleebu
- Medical Research Council/Uganda Virus Research Institute and London School of Hygiene & Tropical Medicine Uganda Research Unit, Entebbe, Uganda
- Uganda Virus Research Institute, Entebbe, Uganda
| |
Collapse
|
22
|
Worobey M, Pekar J, Larsen BB, Nelson MI, Hill V, Joy JB, Rambaut A, Suchard MA, Wertheim JO, Lemey P. The emergence of SARS-CoV-2 in Europe and North America. Science 2020; 370:564-570. [PMID: 32912998 PMCID: PMC7810038 DOI: 10.1126/science.abc8169] [Citation(s) in RCA: 250] [Impact Index Per Article: 62.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 09/03/2020] [Indexed: 12/16/2022]
Abstract
Accurate understanding of the global spread of emerging viruses is critical for public health responses and for anticipating and preventing future outbreaks. Here we elucidate when, where, and how the earliest sustained severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission networks became established in Europe and North America. Our results suggest that rapid early interventions successfully prevented early introductions of the virus from taking hold in Germany and the United States. Other, later introductions of the virus from China to both Italy and Washington state, United States, founded the earliest sustained European and North America transmission networks. Our analyses demonstrate the effectiveness of public health measures in preventing onward transmission and show that intensive testing and contact tracing could have prevented SARS-CoV-2 outbreaks from becoming established in these regions.
Collapse
Affiliation(s)
- Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.
| | - Jonathan Pekar
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA.,Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| | - Brendan B. Larsen
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Martha I. Nelson
- Fogarty International Center, National Institutes of Health, Bethesda, MD 20892, USA
| | - Verity Hill
- Institute of Evolutionary Biology, University of Edinburgh, King’s Buildings, Edinburgh EH9 3FL, UK
| | - Jeffrey B. Joy
- Department of Medicine, University of British Columbia, Vancouver, BC, Canada.,BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada.,Bioinformatics Programme, University of British Columbia, Vancouver, BC, Canada
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, King’s Buildings, Edinburgh EH9 3FL, UK
| | - Marc A. Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA.,Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA 90095, USA.,Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA.,Corresponding author. (M.W.); (M.A.S.); (J.O.W.); (P.L.)
| | - Joel O. Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA.,Corresponding author. (M.W.); (M.A.S.); (J.O.W.); (P.L.)
| | - Philippe Lemey
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Clinical and Epidemiological Virology, Leuven, Belgium.
| |
Collapse
|
23
|
Lequime S, Bastide P, Dellicour S, Lemey P, Baele G. nosoi: A stochastic agent-based transmission chain simulation framework in r. Methods Ecol Evol 2020; 11:1002-1007. [PMID: 32983401 PMCID: PMC7496779 DOI: 10.1111/2041-210x.13422] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 05/13/2020] [Indexed: 12/22/2022]
Abstract
The transmission process of an infectious agent creates a connected chain of hosts linked by transmission events, known as a transmission chain. Reconstructing transmission chains remains a challenging endeavour, except in rare cases characterized by intense surveillance and epidemiological inquiry. Inference frameworks attempt to estimate or approximate these transmission chains but the accuracy and validity of such methods generally lack formal assessment on datasets for which the actual transmission chain was observed.We here introduce nosoi, an open-source r package that offers a complete, tunable and expandable agent-based framework to simulate transmission chains under a wide range of epidemiological scenarios for single-host and dual-host epidemics. nosoi is accessible through GitHub and CRAN, and is accompanied by extensive documentation, providing help and practical examples to assist users in setting up their own simulations.Once infected, each host or agent can undergo a series of events during each time step, such as moving (between locations) or transmitting the infection, all of these being driven by user-specified rules or data, such as travel patterns between locations. nosoi is able to generate a multitude of epidemic scenarios, that can-for example-be used to validate a wide range of reconstruction methods, including epidemic modelling and phylodynamic analyses. nosoi also offers a comprehensive framework to leverage empirically acquired data, allowing the user to explore how variations in parameters can affect epidemic potential. Aside from research questions, nosoi can provide lecturers with a complete teaching tool to offer students a hands-on exploration of the dynamics of epidemiological processes and the factors that impact it. Because the package does not rely on mathematical formalism but uses a more intuitive algorithmic approach, even extensive changes of the entire model can be easily and quickly implemented.
Collapse
Affiliation(s)
- Sebastian Lequime
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
- Cluster of Microbial EcologyGroningen Institute for Evolutionary Life SciencesUniversity of GroningenGroningenThe Netherlands
| | - Paul Bastide
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
- IMAGCNRSUniversity of MontpellierMontpellierFrance
| | - Simon Dellicour
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
- Spatial Epidemiology Lab (SpELL)Université Libre de BruxellesBrusselsBelgium
| | - Philippe Lemey
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
| | - Guy Baele
- Department of Microbiology, Immunology and TransplantationRega InstituteKU LeuvenLeuvenBelgium
| |
Collapse
|
24
|
Worobey M, Pekar J, Larsen BB, Nelson MI, Hill V, Joy JB, Rambaut A, Suchard MA, Wertheim JO, Lemey P. The emergence of SARS-CoV-2 in Europe and the US. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.05.21.109322. [PMID: 32511416 PMCID: PMC7265688 DOI: 10.1101/2020.05.21.109322] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Accurate understanding of the global spread of emerging viruses is critically important for public health response and for anticipating and preventing future outbreaks. Here, we elucidate when, where and how the earliest sustained SARS-CoV-2 transmission networks became established in Europe and the United States (US). Our results refute prior findings erroneously linking cases in January 2020 with outbreaks that occurred weeks later. Instead, rapid interventions successfully prevented onward transmission of those early cases in Germany and Washington State. Other, later introductions of the virus from China to both Italy and Washington State founded the earliest sustained European and US transmission networks. Our analyses reveal an extended period of missed opportunity when intensive testing and contact tracing could have prevented SARS-CoV-2 from becoming established in the US and Europe.
Collapse
Affiliation(s)
- Michael Worobey
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Jonathan Pekar
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, USA
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA
| | - Brendan B. Larsen
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Martha I. Nelson
- Fogarty International Center, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Verity Hill
- Institute of Evolutionary Biology, University of Edinburgh, King’s Buildings, Edinburgh, EH9 3FL, UK
| | - Jeffrey B. Joy
- Department of Medicine, University of British Columbia, Vancouver, BC, Canada
- BC Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
- Bioinformatics Programme, University of British Columbia, Vancouver, BC
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, King’s Buildings, Edinburgh, EH9 3FL, UK
| | - Marc A. Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Joel O. Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Philippe Lemey
- KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, Laboratory of Clinical and Evolutionary Virology, Leuven, Belgium
| |
Collapse
|
25
|
Reimering S, Muñoz S, McHardy AC. Phylogeographic reconstruction using air transportation data and its application to the 2009 H1N1 influenza A pandemic. PLoS Comput Biol 2020; 16:e1007101. [PMID: 32032362 PMCID: PMC7032730 DOI: 10.1371/journal.pcbi.1007101] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 02/20/2020] [Accepted: 01/12/2020] [Indexed: 12/02/2022] Open
Abstract
Influenza A viruses cause seasonal epidemics and occasional pandemics in the human population. While the worldwide circulation of seasonal influenza is at least partly understood, the exact migration patterns between countries, states or cities are not well studied. Here, we use the Sankoff algorithm for parsimonious phylogeographic reconstruction together with effective distances based on a worldwide air transportation network. By first simulating geographic spread and then phylogenetic trees and genetic sequences, we confirmed that reconstructions with effective distances inferred phylogeographic spread more accurately than reconstructions with geographic distances and Bayesian reconstructions with BEAST that do not use any distance information, and led to comparable results to the Bayesian reconstruction using distance information via a generalized linear model. Our method extends Bayesian methods that estimate rates from the data by using fine-grained locations like airports and inferring intermediate locations not observed among sampled isolates. When applied to sequence data of the pandemic H1N1 influenza A virus in 2009, our approach correctly inferred the origin and proposed airports mainly involved in the spread of the virus. In case of a novel outbreak, this approach allows to rapidly analyze sequence data and infer origin and spread routes to improve disease surveillance and control. Influenza A viruses infect up to 5 million people in recurring epidemics every year. Further, viruses of zoonotic origin constantly pose a pandemic risk. Understanding the geographical spread of these viruses, including the origin and the main spread routes between cities, states or countries, could help to monitor or contain novel outbreaks. Based on genetic sequences and sampling locations, the geographic spread can be reconstructed along a phylogenetic tree. Our approach uses a parsimonious reconstruction with air transportation data and was verified using a simulation of the 2009 H1N1 influenza A pandemic. Applied to real sequence data of the outbreak, our analysis gave detailed insights into spread patterns of influenza A viruses, highlighting the origin as well as airports mainly involved in the spread.
Collapse
Affiliation(s)
- Susanne Reimering
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
| | - Sebastian Muñoz
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
| | - Alice C. McHardy
- Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- German Center for Infection Research (DZIF), Braunschweig, Germany
- * E-mail:
| |
Collapse
|
26
|
Abstract
Phylogenetic trees are essential to evolutionary biology, and numerous methods exist that attempt to extract phylogenetic information applicable to a wide range of disciplines, such as epidemiology and metagenomics. Currently, the three main Python packages for trees are Bio.Phylo, DendroPy, and the ETE Toolkit, but as dataset sizes grow, parsing and manipulating ultra-large trees becomes impractical for these tools. To address this issue, we present TreeSwift, a user-friendly and massively scalable Python package for traversing and manipulating trees that is ideal for algorithms performed on ultra-large trees.
Collapse
Affiliation(s)
- N Moshiri
- Department of Computer Science and Engineering, UC San Diego, 92093, USA
| |
Collapse
|
27
|
Liesenborgs J, Hendrickx DM, Kuylen E, Niyukuri D, Hens N, Delva W. SimpactCyan 1.0: An Open-source Simulator for Individual-Based Models in HIV Epidemiology with R and Python Interfaces. Sci Rep 2019; 9:19289. [PMID: 31848434 PMCID: PMC6917719 DOI: 10.1038/s41598-019-55689-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 11/29/2019] [Indexed: 01/21/2023] Open
Abstract
SimpactCyan is an open-source simulator for individual-based models in HIV epidemiology. Its core algorithm is written in C++ for computational efficiency, while the R and Python interfaces aim to make the tool accessible to the fast-growing community of R and Python users. Transmission, treatment and prevention of HIV infections in dynamic sexual networks are simulated by discrete events. A generic “intervention” event allows model parameters to be changed over time, and can be used to model medical and behavioural HIV prevention programmes. First, we describe a more efficient variant of the modified Next Reaction Method that drives our continuous-time simulator. Next, we outline key built-in features and assumptions of individual-based models formulated in SimpactCyan, and provide code snippets for how to formulate, execute and analyse models in SimpactCyan through its R and Python interfaces. Lastly, we give two examples of applications in HIV epidemiology: the first demonstrates how the software can be used to estimate the impact of progressive changes to the eligibility criteria for HIV treatment on HIV incidence. The second example illustrates the use of SimpactCyan as a data-generating tool for assessing the performance of a phylodynamic inference framework.
Collapse
Affiliation(s)
- Jori Liesenborgs
- Expertise Centre for Digital Media, Hasselt University - tUL, Diepenbeek, Belgium
| | - Diana M Hendrickx
- Center for Statistics, I-BioStat, Hasselt University, Diepenbeek, Belgium
| | - Elise Kuylen
- IDLab, University of Antwerp, Antwerp, Belgium.,Centre for Health Economics Research and Modelling Infectious Diseases and Centre for the Evaluation of Vaccination, Vaccine & Infectious Disease Institute, University of Antwerp, Antwerp, Belgium
| | - David Niyukuri
- The South African Department of Science and Technology-National Research Foundation (DST-NRF) Centre of Excellence in Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa.,Department of Global Health, Faculty of Medicine and Health, Stellenbosch University, Stellenbosch, South Africa
| | - Niel Hens
- Center for Statistics, I-BioStat, Hasselt University, Diepenbeek, Belgium.,Centre for Health Economics Research and Modelling Infectious Diseases and Centre for the Evaluation of Vaccination, Vaccine & Infectious Disease Institute, University of Antwerp, Antwerp, Belgium
| | - Wim Delva
- Center for Statistics, I-BioStat, Hasselt University, Diepenbeek, Belgium. .,The South African Department of Science and Technology-National Research Foundation (DST-NRF) Centre of Excellence in Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa. .,Department of Global Health, Faculty of Medicine and Health, Stellenbosch University, Stellenbosch, South Africa. .,International Centre for Reproductive Health, Ghent University, Ghent, Belgium. .,Rega Institute for Medical Research, KU Leuven, Leuven, Belgium. .,School for Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa.
| |
Collapse
|
28
|
Balaban M, Moshiri N, Mai U, Jia X, Mirarab S. TreeCluster: Clustering biological sequences using phylogenetic trees. PLoS One 2019; 14:e0221068. [PMID: 31437182 PMCID: PMC6705769 DOI: 10.1371/journal.pone.0221068] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2019] [Accepted: 07/26/2019] [Indexed: 02/01/2023] Open
Abstract
Clustering homologous sequences based on their similarity is a problem that appears in many bioinformatics applications. The fact that sequences cluster is ultimately the result of their phylogenetic relationships. Despite this observation and the natural ways in which a tree can define clusters, most applications of sequence clustering do not use a phylogenetic tree and instead operate on pairwise sequence distances. Due to advances in large-scale phylogenetic inference, we argue that tree-based clustering is under-utilized. We define a family of optimization problems that, given an arbitrary tree, return the minimum number of clusters such that all clusters adhere to constraints on their heterogeneity. We study three specific constraints, limiting (1) the diameter of each cluster, (2) the sum of its branch lengths, or (3) chains of pairwise distances. These three problems can be solved in time that increases linearly with the size of the tree, and for two of the three criteria, the algorithms have been known in the theoretical computer scientist literature. We implement these algorithms in a tool called TreeCluster, which we test on three applications: OTU clustering for microbiome data, HIV transmission clustering, and divide-and-conquer multiple sequence alignment. We show that, by using tree-based distances, TreeCluster generates more internally consistent clusters than alternatives and improves the effectiveness of downstream applications. TreeCluster is available at https://github.com/niemasd/TreeCluster.
Collapse
Affiliation(s)
- Metin Balaban
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, La Jolla, CA 92093, United States of America
| | - Niema Moshiri
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, La Jolla, CA 92093, United States of America
| | - Uyen Mai
- Computer Science and Engineering, UC San Diego, La Jolla, CA 92093, United States of America
| | - Xingfan Jia
- Department of Mathematics, UC San Diego, La Jolla, CA 92093, United States of America
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, La Jolla, CA 92093, United States of America
| |
Collapse
|