1
|
Enav H, Paz I, Ley RE. Strain tracking in complex microbiomes using synteny analysis reveals per-species modes of evolution. Nat Biotechnol 2024:10.1038/s41587-024-02276-2. [PMID: 38898177 DOI: 10.1038/s41587-024-02276-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/10/2024] [Indexed: 06/21/2024]
Abstract
Microbial species diversify into strains through single-nucleotide mutations and structural changes, such as recombination, insertions and deletions. Most strain-comparison methods quantify differences in single-nucleotide polymorphisms (SNPs) and are insensitive to structural changes. However, recombination is an important driver of phenotypic diversification in many species, including human pathogens. We introduce SynTracker, a tool that compares microbial strains using genome synteny-the order of sequence blocks in homologous genomic regions-in pairs of metagenomic assemblies or genomes. Genome synteny is a rich source of genomic information untapped by current strain-comparison tools. SynTracker has low sensitivity to SNPs, has no database requirement and is robust to sequencing errors. It outperforms existing tools when tracking strains in metagenomic data and is particularly suited for phages, plasmids and other low-data contexts. Applied to single-species datasets and human gut metagenomes, SynTracker, combined with an SNP-based tool, detects strains enriched in either point mutations or structural changes, providing insights into microbial evolution in situ.
Collapse
Affiliation(s)
- Hagay Enav
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany
| | - Inbal Paz
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany
| | - Ruth E Ley
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany.
- Cluster of Excellence EXC 2124: Controlling Microbes to Fight Infections (CMFI), University of Tübingen, Tübingen, Germany.
| |
Collapse
|
2
|
Mallawaarachchi S, Tonkin-Hill G, Pöntinen A, Calland J, Gladstone R, Arredondo-Alonso S, MacAlasdair N, Thorpe H, Top J, Sheppard S, Balding D, Croucher N, Corander J. Detecting co-selection through excess linkage disequilibrium in bacterial genomes. NAR Genom Bioinform 2024; 6:lqae061. [PMID: 38846349 PMCID: PMC11155488 DOI: 10.1093/nargab/lqae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 04/15/2024] [Accepted: 05/14/2024] [Indexed: 06/09/2024] Open
Abstract
Population genomics has revolutionized our ability to study bacterial evolution by enabling data-driven discovery of the genetic architecture of trait variation. Genome-wide association studies (GWAS) have more recently become accompanied by genome-wide epistasis and co-selection (GWES) analysis, which offers a phenotype-free approach to generating hypotheses about selective processes that simultaneously impact multiple loci across the genome. However, existing GWES methods only consider associations between distant pairs of loci within the genome due to the strong impact of linkage-disequilibrium (LD) over short distances. Based on the general functional organisation of genomes it is nevertheless expected that majority of co-selection and epistasis will act within relatively short genomic proximity, on co-variation occurring within genes and their promoter regions, and within operons. Here, we introduce LDWeaver, which enables an exhaustive GWES across both short- and long-range LD, to disentangle likely neutral co-variation from selection. We demonstrate the ability of LDWeaver to efficiently generate hypotheses about co-selection using large genomic surveys of multiple major human bacterial pathogen species and validate several findings using functional annotation and phenotypic measurements. Our approach will facilitate the study of bacterial evolution in the light of rapidly expanding population genomic data.
Collapse
Affiliation(s)
| | | | - Anna K Pöntinen
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
| | - Jessica K Calland
- Oslo Centre for Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway
| | | | | | | | - Harry A Thorpe
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Janetta Top
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| | - Samuel K Sheppard
- Ineos Oxford Institute of Antimicrobial Research, Department of Biology, University of Oxford, Oxford, United Kingdom
| | - David Balding
- Melbourne Integrative Genomics, School of BioSciences and School of Mathematics & Statistics, University of Melbourne, Parkville, Victoria, Australia
| | - Nicholas J Croucher
- Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, United Kingdom
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, United Kingdom
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, UK
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
3
|
Shikov AE, Malovichko YV, Nizhnikov AA, Antonets KS. Current Methods for Recombination Detection in Bacteria. Int J Mol Sci 2022; 23:ijms23116257. [PMID: 35682936 PMCID: PMC9181119 DOI: 10.3390/ijms23116257] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/30/2022] [Accepted: 05/30/2022] [Indexed: 02/05/2023] Open
Abstract
The role of genetic exchanges, i.e., homologous recombination (HR) and horizontal gene transfer (HGT), in bacteria cannot be overestimated for it is a pivotal mechanism leading to their evolution and adaptation, thus, tracking the signs of recombination and HGT events is importance both for fundamental and applied science. To date, dozens of bioinformatics tools for revealing recombination signals are available, however, their pros and cons as well as the spectra of solvable tasks have not yet been systematically reviewed. Moreover, there are two major groups of software. One aims to infer evidence of HR, while the other only deals with horizontal gene transfer (HGT). However, despite seemingly different goals, all the methods use similar algorithmic approaches, and the processes are interconnected in terms of genomic evolution influencing each other. In this review, we propose a classification of novel instruments for both HR and HGT detection based on the genomic consequences of recombination. In this context, we summarize available methodologies paying particular attention to the type of traceable events for which a certain program has been designed.
Collapse
Affiliation(s)
- Anton E. Shikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Yury V. Malovichko
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Anton A. Nizhnikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Kirill S. Antonets
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
- Correspondence:
| |
Collapse
|
4
|
Calland JK, Pascoe B, Bayliss SC, Mourkas E, Berthenet E, Thorpe HA, Hitchings MD, Feil EJ, Corander J, Blaser MJ, Falush D, Sheppard SK. Quantifying bacterial evolution in the wild: A birthday problem for Campylobacter lineages. PLoS Genet 2021; 17:e1009829. [PMID: 34582435 PMCID: PMC8500405 DOI: 10.1371/journal.pgen.1009829] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 10/08/2021] [Accepted: 09/20/2021] [Indexed: 11/20/2022] Open
Abstract
Measuring molecular evolution in bacteria typically requires estimation of the rate at which nucleotide changes accumulate in strains sampled at different times that share a common ancestor. This approach has been useful for dating ecological and evolutionary events that coincide with the emergence of important lineages, such as outbreak strains and obligate human pathogens. However, in multi-host (niche) transmission scenarios, where the pathogen is essentially an opportunistic environmental organism, sampling is often sporadic and rarely reflects the overall population, particularly when concentrated on clinical isolates. This means that approaches that assume recent common ancestry are not applicable. Here we present a new approach to estimate the molecular clock rate in Campylobacter that draws on the popular probability conundrum known as the 'birthday problem'. Using large genomic datasets and comparative genomic approaches, we use isolate pairs that share recent common ancestry to estimate the rate of nucleotide change for the population. Identifying synonymous and non-synonymous nucleotide changes, both within and outside of recombined regions of the genome, we quantify clock-like diversification to estimate synonymous rates of nucleotide change for the common pathogenic bacteria Campylobacter coli (2.4 x 10-6 s/s/y) and Campylobacter jejuni (3.4 x 10-6 s/s/y). Finally, using estimated total rates of nucleotide change, we infer the number of effective lineages within the sample time frame-analogous to a shared birthday-and assess the rate of turnover of lineages in our sample set over short evolutionary timescales. This provides a generalizable approach to calibrating rates in populations of environmental bacteria and shows that multiple lineages are maintained, implying that large-scale clonal sweeps may take hundreds of years or more in these species.
Collapse
Affiliation(s)
- Jessica K. Calland
- The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Ben Pascoe
- The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Sion C. Bayliss
- The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Evangelos Mourkas
- The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Elvire Berthenet
- French National Reference Center for Campylobacters and Helicobacters, University of Bordeaux, Bordeaux, France
- Institute of Life Sciences, Swansea University Medical School, Swansea University, Singleton Park, Swansea, United Kingdom
| | - Harry A. Thorpe
- The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
- Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Matthew D. Hitchings
- Institute of Life Sciences, Swansea University Medical School, Swansea University, Singleton Park, Swansea, United Kingdom
| | - Edward J. Feil
- The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Department of Mathematics and Statistics, Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Martin J. Blaser
- Center for Advanced Biotechnology and Medicine, Rutgers University, New Brunswick, New Jersey, United States of America
| | - Daniel Falush
- Centre for Microbes, Development and Health, Institute Pasteur of Shanghai, Shanghai, China
- * E-mail: (DF); (SKS)
| | - Samuel K. Sheppard
- The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- * E-mail: (DF); (SKS)
| |
Collapse
|
5
|
Bobay LM. CoreSimul: a forward-in-time simulator of genome evolution for prokaryotes modeling homologous recombination. BMC Bioinformatics 2020; 21:264. [PMID: 32580695 PMCID: PMC7315543 DOI: 10.1186/s12859-020-03619-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Accepted: 06/19/2020] [Indexed: 12/26/2022] Open
Abstract
Background Prokaryotes are asexual, but these organisms frequently engage in homologous recombination, a process that differs from meiotic recombination in sexual organisms. Most tools developed to simulate genome evolution either assume sexual reproduction or the complete absence of DNA flux in the population. As a result, very few simulators are adapted to model prokaryotic genome evolution while accounting for recombination. Moreover, many simulators are based on the coalescent, which assumes a neutral model of genomic evolution, and those are best suited for organisms evolving under weak selective pressures, such as animals and plants. In contrast, prokaryotes are thought to be evolving under much stronger selective pressures, suggesting that forward-in-time simulators are better suited for these organisms. Results Here, I present CoreSimul, a forward-in-time simulator of core genome evolution for prokaryotes modeling homologous recombination. Simulations are guided by a phylogenetic tree and incorporate different substitution models, including models of codon selection. Conclusions CoreSimul is a flexible forward-in-time simulator that constitutes a significant addition to the limited list of available simulators applicable to prokaryote genome evolution.
Collapse
Affiliation(s)
- Louis-Marie Bobay
- Department of Biology, University of North Carolina Greensboro, 321 McIver Street, PO Box 26170, Greensboro, NC, 27402, USA.
| |
Collapse
|
6
|
Saber MM, Shapiro BJ. Benchmarking bacterial genome-wide association study methods using simulated genomes and phenotypes. Microb Genom 2020; 6:e000337. [PMID: 32100713 PMCID: PMC7200059 DOI: 10.1099/mgen.0.000337] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 01/23/2020] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWASs) have the potential to reveal the genetics of microbial phenotypes such as antibiotic resistance and virulence. Capitalizing on the growing wealth of bacterial sequence data, microbial GWAS methods aim to identify causal genetic variants while ignoring spurious associations. Bacteria reproduce clonally, leading to strong population structure and genome-wide linkage, making it challenging to separate true 'hits' (i.e. mutations that cause a phenotype) from non-causal linked mutations. GWAS methods attempt to correct for population structure in different ways, but their performance has not yet been systematically and comprehensively evaluated under a range of evolutionary scenarios. Here, we developed a bacterial GWAS simulator (BacGWASim) to generate bacterial genomes with varying rates of mutation, recombination and other evolutionary parameters, along with a subset of causal mutations underlying a phenotype of interest. We assessed the performance (recall and precision) of three widely used single-locus GWAS approaches (cluster-based, dimensionality-reduction and linear mixed models, implemented in plink, pyseer and gemma) and one relatively new multi-locus model implemented in pyseer, across a range of simulated sample sizes, recombination rates and causal mutation effect sizes. As expected, all methods performed better with larger sample sizes and effect sizes. The performance of clustering and dimensionality reduction approaches to correct for population structure were considerably variable according to the choice of parameters. Notably, the multi-locus elastic net (lasso) approach was consistently amongst the highest-performing methods, and had the highest power in detecting causal variants with both low and high effect sizes. Most methods reached the level of good performance (recall >0.75) for identifying causal mutations of strong effect size [log odds ratio (OR) ≥2] with a sample size of 2000 genomes. However, only elastic nets reached the level of reasonable performance (recall=0.35) for detecting markers with weaker effects (log OR ~1) in smaller samples. Elastic nets also showed superior precision and recall in controlling for genome-wide linkage, relative to single-locus models. However, all methods performed relatively poorly on highly clonal (low-recombining) genomes, suggesting room for improvement in method development. These findings show the potential for multi-locus models to improve bacterial GWAS performance. BacGWASim code and simulated data are publicly available to enable further comparisons and benchmarking of new methods.
Collapse
Affiliation(s)
- Morteza M. Saber
- Département de Sciences Biologiques, Université de Montréal, Montréal, QC, Canada
| | - B. Jesse Shapiro
- Département de Sciences Biologiques, Université de Montréal, Montréal, QC, Canada
| |
Collapse
|
7
|
Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW, Weiser JN, Corander J, Bentley SD, Croucher NJ. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Res 2019; 29:304-316. [PMID: 30679308 PMCID: PMC6360808 DOI: 10.1101/gr.241455.118] [Citation(s) in RCA: 187] [Impact Index Per Article: 37.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 12/10/2018] [Indexed: 12/02/2022]
Abstract
The routine use of genomics for disease surveillance provides the opportunity for high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully exploit core and accessory genomic variation, and they cannot both automatically identify, and subsequently expand, clusters of significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK (Population Partitioning Using Nucleotide K-mers), a software implementing scalable and expandable annotation- and alignment-free methods for population analysis and clustering. Variable-length k-mer comparisons are used to distinguish isolates’ divergence in shared sequence and gene content, which we demonstrate to be accurate over multiple orders of magnitude using data from both simulations and genomic collections representing 10 taxonomically widespread species. Connections between closely related isolates of the same strain are robustly identified, despite interspecies variation in the pairwise distance distributions that reflects species’ diverse evolutionary patterns. PopPUNK can process 103–104 genomes in a single batch, with minimal memory use and runtimes up to 200-fold faster than existing model-based methods. Clusters of strains remain consistent as new batches of genomes are added, which is achieved without needing to reanalyze all genomes de novo. This facilitates real-time surveillance with consistent cluster naming between studies and allows for outbreak detection using hundreds of genomes in minutes. Interactive visualization and online publication is streamlined through the automatic output of results to multiple platforms. PopPUNK has been designed as a flexible platform that addresses important issues with currently used whole-genome clustering and typing methods, and has potential uses across bacterial genetics and public health research.
Collapse
Affiliation(s)
- John A Lees
- Department of Microbiology, New York University School of Medicine, New York, New York 10016, USA
| | - Simon R Harris
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Gerry Tonkin-Hill
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Rebecca A Gladstone
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Stephanie W Lo
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Jeffrey N Weiser
- Department of Microbiology, New York University School of Medicine, New York, New York 10016, USA
| | - Jukka Corander
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom.,Department of Biostatistics, University of Oslo, 0372 Oslo, Norway.,Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland
| | - Stephen D Bentley
- Parasites and Microbes, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom.,Institute of Infection and Global Health, University of Liverpool, Liverpool L7 3EA, United Kingdom.,Department of Pathology, University of Cambridge, Cambridge CB2 1QP, United Kingdom
| | - Nicholas J Croucher
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London W2 1PG, United Kingdom
| |
Collapse
|