1
|
Croucher NJ, Page AJ, Connor TR, Delaney AJ, Keane JA, Bentley SD, Parkhill J, Harris SR. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 2014; 43:e15. [PMID: 25414349 PMCID: PMC4330336 DOI: 10.1093/nar/gku1196] [Citation(s) in RCA: 1463] [Impact Index Per Article: 146.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The emergence of new sequencing technologies has facilitated the use of bacterial whole genome alignments for evolutionary studies and outbreak analyses. These datasets, of increasing size, often include examples of multiple different mechanisms of horizontal sequence transfer resulting in substantial alterations to prokaryotic chromosomes. The impact of these processes demands rapid and flexible approaches able to account for recombination when reconstructing isolates' recent diversification. Gubbins is an iterative algorithm that uses spatial scanning statistics to identify loci containing elevated densities of base substitutions suggestive of horizontal sequence transfer while concurrently constructing a maximum likelihood phylogeny based on the putative point mutations outside these regions of high sequence diversity. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistically parameterized models of bacterial evolution, and achieves convergence in only a few hours on alignments of hundreds of bacterial genome sequences. Gubbins is appropriate for reconstructing the recent evolutionary history of a variety of haploid genotype alignments, as it makes no assumptions about the underlying mechanism of recombination. The software is freely available for download at github.com/sanger-pathogens/Gubbins, implemented in Python and C and supported on Linux and Mac OS X.
Collapse
Affiliation(s)
- Nicholas J Croucher
- Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Center for Communicable Disease Dynamics, Harvard School of Public Health, 677 Longwood Avenue, Boston, MA 02115, USA Department of Infectious Disease Epidemiology, Imperial College London, St. Mary's Campus, Norfolk Place, London W2 1PG, UK
| | - Andrew J Page
- Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Thomas R Connor
- Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Cardiff School of Biosciences, Sir Martin Evans Building, Museum Avenue, Cardiff CF10 3AX, UK
| | - Aidan J Delaney
- School of Computing, Engineering and Mathematics, University of Brighton, Brighton BN2 4GJ, UK
| | - Jacqueline A Keane
- Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Stephen D Bentley
- Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK Department of Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge CB2 0SP, UK
| | - Julian Parkhill
- Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Simon R Harris
- Pathogen Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| |
Collapse
|
2
|
Mortimer TD, Pepperell CS. Genomic signatures of distributive conjugal transfer among mycobacteria. Genome Biol Evol 2014; 6:2489-500. [PMID: 25173757 PMCID: PMC4202316 DOI: 10.1093/gbe/evu175] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Distributive conjugal transfer (DCT) is a newly described mechanism of lateral gene transfer (LGT) that results in a mosaic transconjugant structure, similar to the products of meiosis. We have tested popular LGT detection methods on whole-genome sequence data from experimental DCT transconjugants and used the best performing methods to compare genomic signatures of DCT with those of LGT through natural transformation, conjugative plasmids, and mobile genetic elements (MGE). We found that DCT results in transfer of larger chromosomal segments, that these segments are distributed more broadly around the chromosome, and that a greater proportion of the chromosome is affected by DCT than by other mechanisms of LGT. We used the best performing methods to characterize LGT in Mycobacterium canettii, the mycobacterial species most closely related to Mycobacterium tuberculosis. Patterns of LGT among M. canettii were highly distinctive. Gene flow appeared unidirectional, from lineages with minimal evidence of LGT to isolates with a substantial proportion (6–13%) of sites identified as recombinant. Among M. canettii isolates with evidence of LGT, recombinant fragments were larger and more evenly distributed relative to bacteria that undergo LGT through natural transformation, conjugative plasmids, and MGE. Spatial bias in M. canettii was also unusual in that patterns of recombinant fragment sharing mirrored overall phylogenetic structure. Based on the proportion of recombinant sites, the size of recombinant fragments, their spatial distribution and lack of association with MGE, as well as unidirectionality of DNA transfer, we conclude that DCT is the predominant mechanism of LGT among M. canettii.
Collapse
Affiliation(s)
- Tatum D Mortimer
- Department of Medical Microbiology and Immunology, University of Wisconsin-Madison Microbiology Doctoral Training Program, University of Wisconsin-Madison
| | - Caitlin S Pepperell
- Department of Medical Microbiology and Immunology, University of Wisconsin-Madison Department of Medicine, Division of Infectious Diseases, University of Wisconsin-Madison
| |
Collapse
|
3
|
Simian foamy virus infection of rhesus macaques in Bangladesh: relationship of latent proviruses and transcriptionally active viruses. J Virol 2013; 87:13628-39. [PMID: 24109214 DOI: 10.1128/jvi.01989-13] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Simian foamy viruses (SFV) are complex retroviruses that are ubiquitous in nonhuman primates (NHP) and are zoonotically transmitted to humans, presumably through NHP saliva, by licking, biting, and other behaviors. We have studied SFV in free-ranging rhesus macaques in Bangladesh. It has been previously shown that SFV in immunocompetent animals replicates to detectable levels only in superficial epithelial cells of the oral mucosa, although latent proviruses are found in most, if not all, tissues. In this study, we compare DNA sequences from latent SFV proviruses found in blood cells of 30 Bangladesh rhesus macaques to RNA sequences of transcriptionally active SFV from buccal swabs obtained from the same animals. Viral strains, defined by differences in SFV gag sequences, from buccal mucosal specimens overlapped with those from blood samples in 90% of animals. Thus, latent proviruses in peripheral blood mononuclear cells (PBMC) are, to a great extent, representative of viruses likely to be transmitted to other hosts. The level of SFV RNA in buccal swabs varied greatly between macaques, with increasing amounts of viral RNA in older animals. Evidence of APOBEC3-induced mutations was found in gag sequences derived from the blood and oral mucosa.
Collapse
|
4
|
Engel GA, Small CT, Soliven K, Feeroz MM, Wang X, Kamrul Hasan M, Oh G, Rabiul Alam SM, Craig KL, Jackson DL, Matsen Iv FA, Linial ML, Jones-Engel L. Zoonotic simian foamy virus in Bangladesh reflects diverse patterns of transmission and co-infection. Emerg Microbes Infect 2013; 2:e58. [PMID: 26038489 PMCID: PMC3820988 DOI: 10.1038/emi.2013.60] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Revised: 07/23/2013] [Accepted: 07/30/2013] [Indexed: 12/21/2022]
Abstract
Simian foamy viruses (SFVs) are ubiquitous in non-human primates (NHPs). As in all retroviruses, reverse transcription of SFV leads to recombination and mutation. Because more humans have been shown to be infected with SFV than with any other simian borne virus, SFV is a potentially powerful model for studying the virology and epidemiology of viruses at the human/NHP interface. In Asia, SFV is likely transmitted to humans through macaque bites and scratches that occur in the context of everyday life. We analyzed multiple proviral sequences from the SFV gag gene from both humans and macaques in order to characterize retroviral transmission at the human/NHP interface in Bangladesh. Here we report evidence that humans can be concurrently infected with multiple SFV strains, with some individuals infected by both an autochthonous SFV strain as well as a strain similar to SFV found in macaques from another geographic area. These data, combined with previous results, suggest that both human-facilitated movement of macaques leading to the introduction of non-resident strains of SFV and retroviral recombination in macaques contribute to SFV diversity among humans in Bangladesh.
Collapse
Affiliation(s)
- Gregory A Engel
- National Primate Research Center, University of Washington , Seattle, WA 98195, USA ; Department of Family Medicine, Swedish Medical Center , Seattle, WA 98122, USA
| | - Christopher T Small
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center , Seattle, WA 98109, USA
| | - Khanh Soliven
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center , Seattle, WA 98109, USA
| | - Mostafa M Feeroz
- Department of Zoology, Jahangirnagar University , Savar, Dhaka-1342, Bangladesh
| | - Xiaoxing Wang
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center , Seattle, WA 98109, USA
| | - M Kamrul Hasan
- Department of Zoology, Jahangirnagar University , Savar, Dhaka-1342, Bangladesh
| | - Gunwha Oh
- National Primate Research Center, University of Washington , Seattle, WA 98195, USA
| | - S M Rabiul Alam
- Department of Zoology, Jahangirnagar University , Savar, Dhaka-1342, Bangladesh
| | - Karen L Craig
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center , Seattle, WA 98109, USA
| | - Dana L Jackson
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center , Seattle, WA 98109, USA
| | - Frederick A Matsen Iv
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center , Seattle, WA 98109, USA
| | - Maxine L Linial
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center , Seattle, WA 98109, USA
| | - Lisa Jones-Engel
- National Primate Research Center, University of Washington , Seattle, WA 98195, USA
| |
Collapse
|
5
|
Chung Y, Perna NT, Ané C. Computing the joint distribution of tree shape and tree distance for gene tree inference and recombination detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1263-1274. [PMID: 24384712 DOI: 10.1109/tcbb.2013.109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Ancestral recombination events can cause the underlying genealogy of a site to vary along the genome. We consider Bayesian models to simultaneously detect recombination breakpoints in very long sequence alignments and estimate the phylogenetic tree of each block between breakpoints. The models we consider use a dissimilarity measure between trees in their prior distribution to favor similar trees at neighboring loci. We show empirical evidence in Enterobacteria that neighboring genomic regions have similar trees. The main hurdle in using such models is the need to properly calculate the normalizing function for the prior probabilities on trees. In this work, we quantify the impact of approximating this normalizing function as done in biomc2, a hierarchical Bayesian method to detect recombination based on distance between tree topologies. We then derive an algorithm to calculate the normalizing function exactly, for a Gibbs distribution based on the Robinson-Foulds (RF) distance between gene trees at neighboring loci. At the core is the calculation of the joint distribution of the shape of a random tree and its RF distance to a fixed tree. We also propose fast approximations to the normalizing function, which are shown to be very accurate with little impact on the Bayesian inference.
Collapse
|
6
|
Population dynamics of rhesus macaques and associated foamy virus in Bangladesh. Emerg Microbes Infect 2013; 2:e29. [PMID: 26038465 PMCID: PMC3675400 DOI: 10.1038/emi.2013.23] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Revised: 03/11/2013] [Accepted: 03/14/2013] [Indexed: 11/16/2022]
Abstract
Foamy viruses are complex retroviruses that have been shown to be transmitted from nonhuman primates to humans. In Bangladesh, infection with simian foamy virus (SFV) is ubiquitous among rhesus macaques, which come into contact with humans in diverse locations and contexts throughout the country. We analyzed microsatellite DNA from 126 macaques at six sites in Bangladesh in order to characterize geographic patterns of macaque population structure. We also included in this study 38 macaques owned by nomadic people who train them to perform for audiences. PCR was used to analyze a portion of the proviral gag gene from all SFV-positive macaques, and multiple clones were sequenced. Phylogenetic analysis was used to infer long-term patterns of viral transmission. Analyses of SFV gag gene sequences indicated that macaque populations from different areas harbor genetically distinct strains of SFV, suggesting that geographic features such as forest cover play a role in determining the dispersal of macaques and SFV. We also found evidence suggesting that humans traveling the region with performing macaques likely play a role in the translocation of macaques and SFV. Our studies found that individual animals can harbor more than one strain of SFV and that presence of more than one SFV strain is more common among older animals. Some macaques are infected with SFV that appears to be recombinant. These findings paint a more detailed picture of how geographic and sociocultural factors influence the spectrum of simian-borne retroviruses.
Collapse
|
7
|
Truszkowski J, Brown DG. More accurate recombination prediction in HIV-1 using a robust decoding algorithm for HMMs. BMC Bioinformatics 2011; 12:168. [PMID: 21586147 PMCID: PMC3123234 DOI: 10.1186/1471-2105-12-168] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2010] [Accepted: 05/17/2011] [Indexed: 11/13/2022] Open
Abstract
Background Identifying recombinations in HIV is important for studying the epidemiology of the virus and aids in the design of potential vaccines and treatments. The previous widely-used tool for this task uses the Viterbi algorithm in a hidden Markov model to model recombinant sequences. Results We apply a new decoding algorithm for this HMM that improves prediction accuracy. Exactly locating breakpoints is usually impossible, since different subtypes are highly conserved in some sequence regions. Our algorithm identifies these sites up to a certain error tolerance. Our new algorithm is more accurate in predicting the location of recombination breakpoints. Our implementation of the algorithm is available at http://www.cs.uwaterloo.ca/~jmtruszk/jphmm_balls.tar.gz. Conclusions By explicitly accounting for uncertainty in breakpoint positions, our algorithm offers more reliable predictions of recombination breakpoints in HIV-1. We also document a new domain of use for our new decoding approach in HMMs.
Collapse
Affiliation(s)
- Jakub Truszkowski
- David R Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada.
| | | |
Collapse
|
8
|
Distribution of distances between topologies and its effect on detection of phylogenetic recombination. ANN I STAT MATH 2009. [DOI: 10.1007/s10463-009-0259-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
9
|
Schultz AK, Zhang M, Bulla I, Leitner T, Korber B, Morgenstern B, Stanke M. jpHMM: improving the reliability of recombination prediction in HIV-1. Nucleic Acids Res 2009; 37:W647-51. [PMID: 19443440 PMCID: PMC2703979 DOI: 10.1093/nar/gkp371] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Previously, we developed jumping profile hidden Markov model (jpHMM), a new method to detect recombinations in HIV-1 genomes. The jpHMM predicts recombination breakpoints in a query sequence and assigns to each position of the sequence one of the major HIV-1 subtypes. Since incorrect subtype assignment or recombination prediction may lead to wrong conclusions in epidemiological or vaccine research, information about the reliability of the predicted parental subtypes and breakpoint positions is valuable. For this reason, we extended the output of jpHMM to include such information in terms of ‘uncertainty’ regions in the recombination prediction and an interval estimate of the breakpoint. Both types of information are computed based on the posterior probabilities of the subtypes at each query sequence position. Our results show that this extension strongly improves the reliability of the jpHMM recombination prediction. The jpHMM is available online at http://jphmm.gobics.de/.
Collapse
Affiliation(s)
- Anne-Kathrin Schultz
- Institut für Mikrobiologie und Genetik, Abteilung für Bioinformatik, Georg-August-Universität Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | | | | | | | | | | | | |
Collapse
|
10
|
Affiliation(s)
- Bruce Rannala
- Genome Center and Department of Evolution and Ecology, University of California, Davis, California 95616;
| | - Ziheng Yang
- Department of Biology, University College London, London WC1E 6BT United Kingdom; Laboratory of Biometrics, Graduate School of Agriculture and Life Sciences, University of Tokyo, Tokyo, Japan;
| |
Collapse
|
11
|
Martins LDO, Leal E, Kishino H. Phylogenetic detection of recombination with a Bayesian prior on the distance between trees. PLoS One 2008; 3:e2651. [PMID: 18612422 PMCID: PMC2440540 DOI: 10.1371/journal.pone.0002651] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2008] [Accepted: 06/07/2008] [Indexed: 11/18/2022] Open
Abstract
Genomic regions participating in recombination events may support distinct topologies, and phylogenetic analyses should incorporate this heterogeneity. Existing phylogenetic methods for recombination detection are challenged by the enormous number of possible topologies, even for a moderate number of taxa. If, however, the detection analysis is conducted independently between each putative recombinant sequence and a set of reference parentals, potential recombinations between the recombinants are neglected. In this context, a recombination hotspot can be inferred in phylogenetic analyses if we observe several consecutive breakpoints. We developed a distance measure between unrooted topologies that closely resembles the number of recombinations. By introducing a prior distribution on these recombination distances, a Bayesian hierarchical model was devised to detect phylogenetic inconsistencies occurring due to recombinations. This model relaxes the assumption of known parental sequences, still common in HIV analysis, allowing the entire dataset to be analyzed at once. On simulated datasets with up to 16 taxa, our method correctly detected recombination breakpoints and the number of recombination events for each breakpoint. The procedure is robust to rate and transition∶transversion heterogeneities for simulations with and without recombination. This recombination distance is related to recombination hotspots. Applying this procedure to a genomic HIV-1 dataset, we found evidence for hotspots and de novo recombination.
Collapse
|