1
|
Weaver S, Dávila-Conn V, Ji D, Verdonk H, Ávila-Ríos S, Leigh Brown AJ, Wertheim JO, Kosakovsky Pond SL. AUTO-TUNE: SELECTING THE DISTANCE THRESHOLD FOR INFERRING HIV TRANSMISSION CLUSTERS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.11.584522. [PMID: 38559140 PMCID: PMC10979987 DOI: 10.1101/2024.03.11.584522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Molecular surveillance of viral pathogens and inference of transmission networks from genomic data play an increasingly important role in public health efforts, especially for HIV-1. For many methods, the genetic distance threshold used to connect sequences in the transmission network is a key parameter informing the properties of inferred networks. Using a distance threshold that is too high can result in a network with many spurious links, making it difficult to interpret. Conversely, a distance threshold that is too low can result in a network with too few links, which may not capture key insights into clusters of public health concern. Published research using the HIV-TRACE software package frequently uses the default threshold of 0.015 substitutions/site for HIV pol gene sequences, but in many cases, investigators heuristically select other threshold parameters to better capture the underlying dynamics of the epidemic they are studying. Here, we present a general heuristic scoring approach for tuning a distance threshold adaptively, which seeks to prevent the formation of giant clusters. We prioritize the ratio of the sizes of the largest and the second largest cluster, maximizing the number of clusters present in the network. We apply our scoring heuristic to outbreaks with different characteristics, such as regional or temporal variability, and demonstrate the utility of using the scoring mechanism's suggested distance threshold to identify clusters exhibiting risk factors that would have otherwise been more difficult to identify. For example, while we found that a 0.015 substitutions/site distance threshold is typical for US-like epidemics, recent outbreaks like the CRF07_BC subtype among men who have sex with men (MSM) in China have been found to have a lower optimal threshold of 0.005 to better capture the transition from injected drug use (IDU) to MSM as the primary risk factor. Alternatively, in communities surrounding Lake Victoria in Uganda, where there has been sustained hetero-sexual transmission for many years, we found that a larger distance threshold is necessary to capture a more risk factor-diverse population with sparse sampling over a longer period of time. Such identification may allow for more informed intervention action by respective public health officials.
Collapse
Affiliation(s)
- Steven Weaver
- Center for Viral Evolution, Temple University, Philadelphia, PA, USA
| | - Vanessa Dávila-Conn
- Center for Research in Infectious Diseases, National Institute of Respiratory Diseases, Mexico City, Mexico
| | - Daniel Ji
- Department of Computer Science & Engineering, UC San Diego, La Jolla, CA 92093, USA
| | - Hannah Verdonk
- Center for Viral Evolution, Temple University, Philadelphia, PA, USA
| | - Santiago Ávila-Ríos
- Center for Research in Infectious Diseases, National Institute of Respiratory Diseases, Mexico City, Mexico
| | - Andrew J Leigh Brown
- School of Biological Sciences, University of Edinburgh, Edinburgh, Scotland, United Kingdom
| | - Joel O Wertheim
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | | |
Collapse
|
2
|
Switzer WM, Shankar A, Jia H, Knyazev S, Ambrosio F, Kelly R, Zheng H, Campbell EM, Cintron R, Pan Y, Saduvala N, Panneer N, Richman R, Singh MB, Thoroughman DA, Blau EF, Khalil GM, Lyss S, Heneine W. High HIV diversity, recombination, and superinfection revealed in a large outbreak among persons who inject drugs in Kentucky and Ohio, USA. Virus Evol 2024; 10:veae015. [PMID: 38510920 PMCID: PMC10953796 DOI: 10.1093/ve/veae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 01/30/2024] [Accepted: 02/05/2024] [Indexed: 03/22/2024] Open
Abstract
We investigated transmission dynamics of a large human immunodeficiency virus (HIV) outbreak among persons who inject drugs (PWID) in KY and OH during 2017-20 by using detailed phylogenetic, network, recombination, and cluster dating analyses. Using polymerase (pol) sequences from 193 people associated with the investigation, we document high HIV-1 diversity, including Subtype B (44.6 per cent); numerous circulating recombinant forms (CRFs) including CRF02_AG (2.5 per cent) and CRF02_AG-like (21.8 per cent); and many unique recombinant forms composed of CRFs with major subtypes and sub-subtypes [CRF02_AG/B (24.3 per cent), B/CRF02_AG/B (0.5 per cent), and A6/D/B (6.4 per cent)]. Cluster analysis of sequences using a 1.5 per cent genetic distance identified thirteen clusters, including a seventy-five-member cluster composed of CRF02_AG-like and CRF02_AG/B, an eighteen-member CRF02_AG/B cluster, Subtype B clusters of sizes ranging from two to twenty-three, and a nine-member A6/D and A6/D/B cluster. Recombination and phylogenetic analyses identified CRF02_AG/B variants with ten unique breakpoints likely originating from Subtype B and CRF02_AG-like viruses in the largest clusters. The addition of contact tracing results from OH to the genetic networks identified linkage between persons with Subtype B, CRF02_AG, and CRF02_AG/B sequences in the clusters supporting de novo recombinant generation. Superinfection prevalence was 13.3 per cent (8/60) in persons with multiple specimens and included infection with B and CRF02_AG; B and CRF02_AG/B; or B and A6/D/B. In addition to the presence of multiple, distinct molecular clusters associated with this outbreak, cluster dating inferred transmission associated with the largest molecular cluster occurred as early as 2006, with high transmission rates during 2017-8 in certain other molecular clusters. This outbreak among PWID in KY and OH was likely driven by rapid transmission of multiple HIV-1 variants including de novo viral recombinants from circulating viruses within the community. Our findings documenting the high HIV-1 transmission rate and clustering through partner services and molecular clusters emphasize the importance of leveraging multiple different data sources and analyses, including those from disease intervention specialist investigations, to better understand outbreak dynamics and interrupt HIV spread.
Collapse
Affiliation(s)
- William M Switzer
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Anupama Shankar
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Hongwei Jia
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Sergey Knyazev
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
- Oak Ridge Institute for Science and Education, 1299 Bethel Valley Rd, Oak Ridge, TN 37830, USA
| | - Frank Ambrosio
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Reagan Kelly
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
- General Dynamics Information Technology, 3150 Fairview Park Dr, Falls Church, VA 22042, USA
| | - HaoQiang Zheng
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | | | - Roxana Cintron
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Yi Pan
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | | | - Nivedha Panneer
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Rhiannon Richman
- HIV Surveillance Program, Bureau of HIV/STI/Viral Hepatitis, Ohio Department of Health, 246 North High Street, Colombus, OH 43215, USA
| | - Manny B Singh
- Division of Epidemiology and Health Planning, Kentucky Department for Public Health, Frankfort, KY 40621, USA
| | - Douglas A Thoroughman
- Division of Epidemiology and Health Planning, Kentucky Department for Public Health, Frankfort, KY 40621, USA
- ORR/Division of State and Local Readiness/Field Services Branch/CEFO Program, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Erin F Blau
- Division of Epidemiology and Health Planning, Kentucky Department for Public Health, Frankfort, KY 40621, USA
- Epidemic Intelligence Service, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - George M Khalil
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Sheryl Lyss
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
- HIV Surveillance Program, Bureau of HIV/STI/Viral Hepatitis, Ohio Department of Health, 246 North High Street, Colombus, OH 43215, USA
- Division of Epidemiology and Health Planning, Kentucky Department for Public Health, Frankfort, KY 40621, USA
- Hamilton County Public Health, 250 William Howard Taft Rd, Cincinnati, OH 45219, USA
- Northern Kentucky Health Department, 8001 Veterans Memorial Drive, Florence, KY 41042, USA
| | - Walid Heneine
- Division of HIV Prevention, CDC, 1600 Clifton Rd, Atlanta, GA 30329, USA
| |
Collapse
|
3
|
Optimized phylogenetic clustering of HIV-1 sequence data for public health applications. PLoS Comput Biol 2022; 18:e1010745. [PMID: 36449514 PMCID: PMC9744331 DOI: 10.1371/journal.pcbi.1010745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 12/12/2022] [Accepted: 11/17/2022] [Indexed: 12/02/2022] Open
Abstract
Clusters of genetically similar infections suggest rapid transmission and may indicate priorities for public health action or reveal underlying epidemiological processes. However, clusters often require user-defined thresholds and are sensitive to non-epidemiological factors, such as non-random sampling. Consequently the ideal threshold for public health applications varies substantially across settings. Here, we show a method which selects optimal thresholds for phylogenetic (subset tree) clustering based on population. We evaluated this method on HIV-1 pol datasets (n = 14, 221 sequences) from four sites in USA (Tennessee, Washington), Canada (Northern Alberta) and China (Beijing). Clusters were defined by tips descending from an ancestral node (with a minimum bootstrap support of 95%) through a series of branches, each with a length below a given threshold. Next, we used pplacer to graft new cases to the fixed tree by maximum likelihood. We evaluated the effect of varying branch-length thresholds on cluster growth as a count outcome by fitting two Poisson regression models: a null model that predicts growth from cluster size, and an alternative model that includes mean collection date as an additional covariate. The alternative model was favoured by AIC across most thresholds, with optimal (greatest difference in AIC) thresholds ranging 0.007-0.013 across sites. The range of optimal thresholds was more variable when re-sampling 80% of the data by location (IQR 0.008 - 0.016, n = 100 replicates). Our results use prospective phylogenetic cluster growth and suggest that there is more variation in effective thresholds for public health than those typically used in clustering studies.
Collapse
|