1
|
Sommer H, Djamalova D, Galardini M. Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric k-mers. Microb Genom 2023; 9. [PMID: 37934071 DOI: 10.1099/mgen.0.001129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023] Open
Abstract
The wide adoption of bacterial genome sequencing and encoding both core and accessory genome variation using k-mers has allowed bacterial genome-wide association studies (GWAS) to identify genetic variants associated with relevant phenotypes such as those linked to infection. Significant limitations still remain because of k-mers being duplicated across gene clusters and as far as the interpretation of association results is concerned, which affects the wider adoption of GWAS methods on microbial data sets. We have developed a simple computational method (panfeed) that explicitly links each k-mer to their gene cluster at base-resolution level, which allows us to avoid biases introduced by a global de Bruijn graph as well as more easily map and annotate associated variants. We tested panfeed on two independent data sets, correctly identifying previously characterized causal variants, which demonstrates the precision of the method, as well as its scalable performance. panfeed is a command line tool written in the python programming language and is available at https://github.com/microbial-pangenomes-lab/panfeed.
Collapse
Affiliation(s)
- Hannes Sommer
- Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany
| | - Dilfuza Djamalova
- Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany
| | - Marco Galardini
- Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany
| |
Collapse
|
2
|
Singh S, Hu X, Dixelius C. Dynamics of nucleic acid mobility. Genetics 2023; 225:iyad132. [PMID: 37491977 PMCID: PMC10471207 DOI: 10.1093/genetics/iyad132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 07/10/2023] [Indexed: 07/27/2023] Open
Abstract
Advances in sequencing technologies and bioinformatic analyses are accelerating the quantity and quality of data from all domains of life. This rich resource has the potential to reveal a number of important incidences with respect to possible exchange of nucleic acids. Ancient events have impacted species evolution and adaptation to new ecological niches. However, we still lack a full picture of processes ongoing within and between somatic cells, gametes, and different organisms. We propose that events linked to acceptance of alien nucleic acids grossly could be divided into 2 main routes in plants: one, when plants are exposed to extreme challenges and, the second level, a more everyday or season-related stress incited by biotic or abiotic factors. Here, many events seem to comprise somatic cells. Are the transport and acceptance processes of alien sequences random or are there specific regulatory systems not yet fully understood? Following entrance into a new cell, a number of intracellular processes leading to chromosomal integration and function are required. Modification of nucleic acids and possibly exchange of sequences within a cell may also occur. Such fine-tune events are most likely very common. There are multiple questions that we will discuss concerning different types of vesicles and their roles in nucleic acid transport and possible intracellular sequence exchange between species.
Collapse
Affiliation(s)
- Shailja Singh
- Department of Plant Biology, Uppsala BioCenter, Linnéan Center for Plant Biology, Swedish University of Agricultural Sciences, P.O. Box 7080, Uppsala, SE-75007, Sweden
| | - Xinyi Hu
- Department of Plant Biology, Uppsala BioCenter, Linnéan Center for Plant Biology, Swedish University of Agricultural Sciences, P.O. Box 7080, Uppsala, SE-75007, Sweden
| | - Christina Dixelius
- Department of Plant Biology, Uppsala BioCenter, Linnéan Center for Plant Biology, Swedish University of Agricultural Sciences, P.O. Box 7080, Uppsala, SE-75007, Sweden
| |
Collapse
|
3
|
Garza DR, von Meijenfeldt FAB, van Dijk B, Boleij A, Huynen MA, Dutilh BE. Nutrition or nature: using elementary flux modes to disentangle the complex forces shaping prokaryote pan-genomes. BMC Ecol Evol 2022; 22:101. [PMID: 35974327 PMCID: PMC9382767 DOI: 10.1186/s12862-022-02052-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 07/22/2022] [Indexed: 11/15/2022] Open
Abstract
Background Microbial pan-genomes are shaped by a complex combination of stochastic and deterministic forces. Even closely related genomes exhibit extensive variation in their gene content. Understanding what drives this variation requires exploring the interactions of gene products with each other and with the organism’s external environment. However, to date, conceptual models of pan-genome dynamics often represent genes as independent units and provide limited information about their mechanistic interactions. Results We simulated the stochastic process of gene-loss using the pooled genome-scale metabolic reaction networks of 46 taxonomically diverse bacterial and archaeal families as proxies for their pan-genomes. The frequency by which reactions are retained in functional networks when stochastic gene loss is simulated in diverse environments allowed us to disentangle the metabolic reactions whose presence depends on the metabolite composition of the external environment (constrained by “nutrition”) from those that are independent of the environment (constrained by “nature”). By comparing the frequency of reactions from the first group with their observed frequencies in bacterial and archaeal families, we predicted the metabolic niches that shaped the genomic composition of these lineages. Moreover, we found that the lineages that were shaped by a more diverse metabolic niche also occur in more diverse biomes as assessed by global environmental sequencing datasets. Conclusion We introduce a computational framework for analyzing and interpreting pan-reactomes that provides novel insights into the ecological and evolutionary drivers of pan-genome dynamics. Supplementary Information The online version contains supplementary material available at 10.1186/s12862-022-02052-3.
Collapse
|
4
|
Preska Steinberg A, Lin M, Kussell E. Core genes can have higher recombination rates than accessory genes within global microbial populations. eLife 2022; 11:78533. [PMID: 35801696 PMCID: PMC9444244 DOI: 10.7554/elife.78533] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/30/2022] [Indexed: 11/24/2022] Open
Abstract
Recombination is essential to microbial evolution, and is involved in the spread of antibiotic resistance, antigenic variation, and adaptation to the host niche. However, assessing the impact of homologous recombination on accessory genes which are only present in a subset of strains of a given species remains challenging due to their complex phylogenetic relationships. Quantifying homologous recombination for accessory genes (which are important for niche-specific adaptations) in comparison to core genes (which are present in all strains and have essential functions) is critical to understanding how selection acts on variation to shape species diversity and genome structures of bacteria. Here, we apply a computationally efficient, non-phylogenetic approach to measure homologous recombination rates in the core and accessory genome using >100,000 whole genome sequences from Streptococcus pneumoniae and several additional species. By analyzing diverse sets of sequence clusters, we show that core genes often have higher recombination rates than accessory genes, and for some bacterial species the associated effect sizes for these differences are pronounced. In a subset of species, we find that gene frequency and homologous recombination rate are positively correlated. For S. pneumoniae and several additional species, we find that while the recombination rate is higher for the core genome, the mutational divergence is lower, indicating that divergence-based homologous recombination barriers could contribute to differences in recombination rates between the core and accessory genome. Homologous recombination may therefore play a key role in increasing the efficiency of selection in the most conserved parts of the genome.
Collapse
Affiliation(s)
| | - Mingzhi Lin
- Department of Biology, New York University, New York, United States
| | - Edo Kussell
- Department of Biology, New York University, New York, United States
| |
Collapse
|
5
|
Abstract
Horizontal gene transfer (HGT) is arguably the most conspicuous feature of bacterial evolution. Evidence for HGT is found in most bacterial genomes. Although HGT can considerably alter bacterial genomes, not all transfer events may be biologically significant and may instead represent the outcome of an incessant evolutionary process that only occasionally has a beneficial purpose. When adaptive transfers occur, HGT and positive selection may result in specific, detectable signatures in genomes, such as gene-specific sweeps or increased transfer rates for genes that are ecologically relevant. In this Review, we first discuss the various mechanisms whereby HGT occurs, how the genetic signatures shape patterns of genomic variation and the distinct bioinformatic algorithms developed to detect these patterns. We then discuss the evolutionary theory behind HGT and positive selection in bacteria, and discuss the approaches developed over the past decade to detect transferred DNA that may be involved in adaptation to new environments.
Collapse
|
6
|
Li Y, Jiang B, Dai W. A large-scale whole-genome sequencing analysis reveals false positives of bacterial essential genes. Appl Microbiol Biotechnol 2021; 106:341-347. [PMID: 34889987 DOI: 10.1007/s00253-021-11702-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 11/05/2021] [Accepted: 11/15/2021] [Indexed: 11/26/2022]
Abstract
Essential genes are crucial for bacterial viability and represent attractive targets for novel anti-pathogen drug discovery. However, essential genes determined by the transposon insertion sequencing (Tn-seq) approach often contain many false positives. We hypothesized that some of those false positives are genes that are actually deleted from the genome, so they do not present any transposon insertion in the course of Tn-seq analysis. Based on this assumption, we performed a large-scale whole-genome sequencing analysis for the bacterium of interest. Our analysis revealed that some "essential genes" are indeed removed from the analyzed bacterial genomes. Since these genes were kicked out by bacteria, they should not be defined as essential. Our work showed that gene deletion is one of the false positive sources of essentiality determination, which is apparently underestimated in previous studies. We suggest subtracting the genome backgrounds before the evaluation of Tn-seq, and created a list of false positive gene essentiality as a reference for the downstream application. KEY POINTS: • Discovery of false positives of essential genes defined previously through the analyses of a large scale of whole-genome sequencing data • These false positives are the results of gene deletions in the studied genomes • Sequencing the target genome before Tn-seq analysis is of importance while some studies neglected it.
Collapse
Affiliation(s)
- Yuanhao Li
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, 510006, China
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Center, South China Agricultural University, Guangzhou, 510642, China
| | - Bo Jiang
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, 510006, China
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Center, South China Agricultural University, Guangzhou, 510642, China
| | - Weijun Dai
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, 510006, China.
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Center, South China Agricultural University, Guangzhou, 510642, China.
| |
Collapse
|
7
|
Colquhoun RM, Hall MB, Lima L, Roberts LW, Malone KM, Hunt M, Letcher B, Hawkey J, George S, Pankhurst L, Iqbal Z. Pandora: nucleotide-resolution bacterial pan-genomics with reference graphs. Genome Biol 2021; 22:267. [PMID: 34521456 PMCID: PMC8442373 DOI: 10.1186/s13059-021-02473-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 08/19/2021] [Indexed: 12/21/2022] Open
Abstract
We present pandora, a novel pan-genome graph structure and algorithms for identifying variants across the full bacterial pan-genome. As much bacterial adaptability hinges on the accessory genome, methods which analyze SNPs in just the core genome have unsatisfactory limitations. Pandora approximates a sequenced genome as a recombinant of references, detects novel variation and pan-genotypes multiple samples. Using a reference graph of 578 Escherichia coli genomes, we compare 20 diverse isolates. Pandora recovers more rare SNPs than single-reference-based tools, is significantly better than picking the closest RefSeq reference, and provides a stable framework for analyzing diverse samples without reference bias.
Collapse
Affiliation(s)
- Rachel M Colquhoun
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
- Institute of Evolutionary Biology, Ashworth Laboratories, University of Edinburgh, Edinburgh, UK
| | - Michael B Hall
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Leandro Lima
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Leah W Roberts
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Kerri M Malone
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Martin Hunt
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Brice Letcher
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Jane Hawkey
- Department of Infectious Diseases, Central Clinical School, Monash University, Melbourne, Victoria, 3004, Australia
| | - Sophie George
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Louise Pankhurst
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Department of Zoology, University of Oxford, Mansfield Road, Oxford, UK
| | - Zamin Iqbal
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK.
| |
Collapse
|
8
|
Sela I, Wolf YI, Koonin EV. Assessment of assumptions underlying models of prokaryotic pangenome evolution. BMC Biol 2021; 19:27. [PMID: 33563283 PMCID: PMC7874442 DOI: 10.1186/s12915-021-00960-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 01/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The genomes of bacteria and archaea evolve by extensive loss and gain of genes which, for any group of related prokaryotic genomes, result in the formation of a pangenome with the universal, asymmetrical U-shaped distribution of gene commonality. However, the evolutionary factors that define the specific shape of this distribution are not thoroughly understood. RESULTS We investigate the fit of simple models of genome evolution to the empirically observed gene commonality distributions and genome intersections for 33 groups of closely related bacterial genomes. A model with an infinite external gene pool available for gene acquisition and constant genome size (IGP-CGS model), and two gene turnover rates, one for slow- and the other one for fast-evolving genes, allows two approaches to estimate the parameters for gene content dynamics. One is by fitting the model prediction to the distribution of the number of genes shared by precisely k genomes (gene commonality distribution) and another by analyzing the distribution of the number of genes common for k genome sets (k-cores). Both approaches produce a comparable overall quality of fit, although the former significantly overestimates the number of the universally conserved genes, while the latter overestimates the number of singletons. We further explore the effect of dropping each of the assumptions of the IGP-CGS model on the fit to the gene commonality distributions and show that models with either a finite gene pool or unequal rates of gene loss and gain (greater gene loss rate) eliminate the overestimate of the number of singletons or the core genome size. CONCLUSIONS We examine the assumptions that are usually adopted for modeling the evolution of the U-shaped gene commonality distributions in prokaryote genomes, namely, those of infinitely many genes and constant genome size. The combined analysis of genome intersections and gene commonality suggests that at least one of these assumptions is invalid. The violation of both these assumptions reflects the limited ability of prokaryotes to gain new genes. This limitation seems to stem, at least partly, from the horizontal gene transfer barrier, i.e., the cost of accommodation of foreign genes by prokaryotes. Further development of models taking into account the complexity of microbial evolution is necessary for an improved understanding of the evolution of prokaryotes.
Collapse
Affiliation(s)
- Itamar Sela
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
9
|
Koonin EV, Makarova KS, Wolf YI. Evolution of Microbial Genomics: Conceptual Shifts over a Quarter Century. Trends Microbiol 2021; 29:582-592. [PMID: 33541841 DOI: 10.1016/j.tim.2021.01.005] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 01/07/2021] [Accepted: 01/08/2021] [Indexed: 12/20/2022]
Abstract
Prokaryote genomics started in earnest in 1995, with the complete sequences of two small bacterial genomes, those of Haemophilus influenzae and Mycoplasma genitalium. During the next quarter century, the prokaryote genome database has been growing exponentially, with no saturation in sight. For most of these 25 years, genome sequencing remained limited to cultivable microbes. Together with next-generation sequencing methods, advances in metagenomics and single-cell genomics have lifted this limitation, providing for an increasingly unbiased characterization of the global prokaryote diversity. Advances in computational genomics followed the progress of genome sequencing, even if occasionally lagging behind. Several major new branches of bacteria and archaea were discovered, including Asgard archaea, the apparent closest relatives of eukaryotes and expansive groups of bacteria and archaea with small genomes thought to be symbionts of other prokaryotes. Comparative analysis of numerous prokaryote genomes spanning a wide range of evolutionary distances changed the conceptual foundations of microbiology, supplanting the notion of species genomes with fixed gene sets with that of dynamic pangenomes and the notion of a single Tree of Life (ToL) with a statistical tree-like trend among individual gene trees. Strides were also made towards a theory and quantitative laws of prokaryote genome evolution.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA.
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| |
Collapse
|
10
|
Domingo-Sananes MR, McInerney JO. Mechanisms That Shape Microbial Pangenomes. Trends Microbiol 2021; 29:493-503. [PMID: 33423895 DOI: 10.1016/j.tim.2020.12.004] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 12/09/2020] [Accepted: 12/10/2020] [Indexed: 01/02/2023]
Abstract
Analyses of multiple whole-genome sequences from the same species have revealed that differences in gene content can be substantial, particularly in prokaryotes. Such variation has led to the recognition of pangenomes, the complete set of genes present in a species - consisting of core genes, present in all individuals, and accessory genes whose presence is variable. Questions now arise about how pangenomes originate and evolve. We describe how gene content variation can arise as a result of the combination of several processes, including random drift, selection, gain/loss balance, and the influence of ecological and epistatic interactions. We believe that identifying the contributions of these processes to pangenomes will need novel theoretical approaches and empirical data.
Collapse
Affiliation(s)
- Maria Rosa Domingo-Sananes
- School of Life Sciences, University of Nottingham, Nottingham, UK; School of Science and Technology, Nottingham Trent University, Nottingham, UK.
| | | |
Collapse
|
11
|
Richard D, Pruvost O, Balloux F, Boyer C, Rieux A, Lefeuvre P. Time-calibrated genomic evolution of a monomorphic bacterium during its establishment as an endemic crop pathogen. Mol Ecol 2020; 30:1823-1835. [PMID: 33305421 DOI: 10.1111/mec.15770] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 11/30/2020] [Accepted: 12/03/2020] [Indexed: 01/03/2023]
Abstract
Horizontal gene transfer is of major evolutionary importance as it allows for the redistribution of phenotypically important genes among lineages. Such genes with essential functions include those involved in resistance to antimicrobial compounds and virulence factors in pathogenic bacteria. Understanding gene turnover at microevolutionary scales is critical to assess the pace of this evolutionary process. Here, we characterized and quantified gene turnover for the epidemic lineage of a bacterial plant pathogen of major agricultural importance worldwide. Relying on a dense geographic sampling spanning 39 years of evolution, we estimated both the dynamics of single nucleotide polymorphism accumulation and gene content turnover. We identified extensive gene content variation among lineages even at the smallest phylogenetic and geographic scales. Gene turnover rate exceeded nucleotide substitution rate by three orders of magnitude. Accessory genes were found preferentially located on plasmids, but we identified a highly plastic chromosomal region hosting ecologically important genes such as transcription activator-like effectors. Whereas most changes in the gene content are probably transient, the rapid spread of a mobile element conferring resistance to copper compounds widely used for the management of plant bacterial pathogens illustrates how some accessory genes can become ubiquitous within a population over short timeframes.
Collapse
Affiliation(s)
- Damien Richard
- Cirad, UMR PVBMT, Réunion, France.,ANSES, Plant Health Laboratory, Réunion, France.,Université de la Réunion, UMR PVBMT, Réunion, France
| | | | | | | | | | | |
Collapse
|
12
|
Tian L, Wang XW, Wu AK, Fan Y, Friedman J, Dahlin A, Waldor MK, Weinstock GM, Weiss ST, Liu YY. Deciphering functional redundancy in the human microbiome. Nat Commun 2020; 11:6217. [PMID: 33277504 PMCID: PMC7719190 DOI: 10.1038/s41467-020-19940-1] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 11/04/2020] [Indexed: 02/07/2023] Open
Abstract
Although the taxonomic composition of the human microbiome varies tremendously across individuals, its gene composition or functional capacity is highly conserved - implying an ecological property known as functional redundancy. Such functional redundancy has been hypothesized to underlie the stability and resilience of the human microbiome, but this hypothesis has never been quantitatively tested. The origin of functional redundancy is still elusive. Here, we investigate the basis for functional redundancy in the human microbiome by analyzing its genomic content network - a bipartite graph that links microbes to the genes in their genomes. We find that this network exhibits several topological features that favor high functional redundancy. Furthermore, we develop a simple genome evolution model to generate genomic content network, finding that moderate selection pressure and high horizontal gene transfer rate are necessary to generate genomic content networks with key topological features that favor high functional redundancy. Finally, we analyze data from two published studies of fecal microbiota transplantation (FMT), finding that high functional redundancy of the recipient's pre-FMT microbiota raises barriers to donor microbiota engraftment. This work elucidates the potential ecological and evolutionary processes that create and maintain functional redundancy in the human microbiome and contribute to its resilience.
Collapse
Affiliation(s)
- Liang Tian
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Department of Physics, Hong Kong Baptist University, Hong Kong SAR, China
- Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Hong Kong SAR, China
- State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong SAR, China
| | - Xu-Wen Wang
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Ang-Kun Wu
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Department of Physics and Astronomy, Rutgers University, Piscataway, NJ, 08854, USA
| | - Yuhang Fan
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| | - Jonathan Friedman
- Faculty of Agriculture, Food and Environment, Department of Plant Pathology and Microbiology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Amber Dahlin
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Matthew K Waldor
- Division of Infectious Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
- Howard Hughes Medical Institute, Boston, MA, 02115, USA
| | | | - Scott T Weiss
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA
| | - Yang-Yu Liu
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
13
|
Tovo A, Menzel P, Krogh A, Cosentino Lagomarsino M, Suweis S. Taxonomic classification method for metagenomics based on core protein families with Core-Kaiju. Nucleic Acids Res 2020; 48:e93. [PMID: 32633756 PMCID: PMC7498351 DOI: 10.1093/nar/gkaa568] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 06/12/2020] [Accepted: 06/24/2020] [Indexed: 12/19/2022] Open
Abstract
Characterizing species diversity and composition of bacteria hosted by biota is revolutionizing our understanding of the role of symbiotic interactions in ecosystems. Determining microbiomes diversity implies the assignment of individual reads to taxa by comparison to reference databases. Although computational methods aimed at identifying the microbe(s) taxa are available, it is well known that inferences using different methods can vary widely depending on various biases. In this study, we first apply and compare different bioinformatics methods based on 16S ribosomal RNA gene and shotgun sequencing to three mock communities of bacteria, of which the compositions are known. We show that none of these methods can infer both the true number of taxa and their abundances. We thus propose a novel approach, named Core-Kaiju, which combines the power of shotgun metagenomics data with a more focused marker gene classification method similar to 16S, but based on emergent statistics of core protein domain families. We thus test the proposed method on various mock communities and we show that Core-Kaiju reliably predicts both number of taxa and abundances. Finally, we apply our method on human gut samples, showing how Core-Kaiju may give more accurate ecological characterization and a fresh view on real microbiomes.
Collapse
Affiliation(s)
- Anna Tovo
- Physics and Astronomy Department, LIPh Lab, University of Padova, Via Marzolo 8, 35131 Padova, Italy.,Mathematics Department, University of Padova, via Trieste 63, 35121 Padova, Italy
| | - Peter Menzel
- Labor Berlin Charité Vivantes GmbH, Sylter Str. 2, 13353 Berlin, Germany
| | - Anders Krogh
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, DK-2100 Copenhagen, Denmark
| | - Marco Cosentino Lagomarsino
- IFOM, FIRC Institute of Molecular Oncology, Via Adamello 16, 20143 Milan, Italy.,Physics Department, University of Milan, and I.N.F.N., Via Celoria 16, 20133 Milan, Italy
| | - Samir Suweis
- Physics and Astronomy Department, LIPh Lab, University of Padova, Via Marzolo 8, 35131 Padova, Italy.,Padova Neuroscience Center, University of Padova, Via Orus 2/B, 35131 Padova, Italy
| |
Collapse
|
14
|
Koonin EV, Dolja VV, Krupovic M, Varsani A, Wolf YI, Yutin N, Zerbini FM, Kuhn JH. Global Organization and Proposed Megataxonomy of the Virus World. Microbiol Mol Biol Rev 2020; 84:e00061-19. [PMID: 32132243 PMCID: PMC7062200 DOI: 10.1128/mmbr.00061-19] [Citation(s) in RCA: 322] [Impact Index Per Article: 80.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Viruses and mobile genetic elements are molecular parasites or symbionts that coevolve with nearly all forms of cellular life. The route of virus replication and protein expression is determined by the viral genome type. Comparison of these routes led to the classification of viruses into seven "Baltimore classes" (BCs) that define the major features of virus reproduction. However, recent phylogenomic studies identified multiple evolutionary connections among viruses within each of the BCs as well as between different classes. Due to the modular organization of virus genomes, these relationships defy simple representation as lines of descent but rather form complex networks. Phylogenetic analyses of virus hallmark genes combined with analyses of gene-sharing networks show that replication modules of five BCs (three classes of RNA viruses and two classes of reverse-transcribing viruses) evolved from a common ancestor that encoded an RNA-directed RNA polymerase or a reverse transcriptase. Bona fide viruses evolved from this ancestor on multiple, independent occasions via the recruitment of distinct cellular proteins as capsid subunits and other structural components of virions. The single-stranded DNA (ssDNA) viruses are a polyphyletic class, with different groups evolving by recombination between rolling-circle-replicating plasmids, which contributed the replication protein, and positive-sense RNA viruses, which contributed the capsid protein. The double-stranded DNA (dsDNA) viruses are distributed among several large monophyletic groups and arose via the combination of distinct structural modules with equally diverse replication modules. Phylogenomic analyses reveal the finer structure of evolutionary connections among RNA viruses and reverse-transcribing viruses, ssDNA viruses, and large subsets of dsDNA viruses. Taken together, these analyses allow us to outline the global organization of the virus world. Here, we describe the key aspects of this organization and propose a comprehensive hierarchical taxonomy of viruses.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Valerian V Dolja
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, USA
| | - Mart Krupovic
- Institut Pasteur, Archaeal Virology Unit, Department of Microbiology, Paris, France
| | - Arvind Varsani
- The Biodesign Center for Fundamental and Applied Microbiomics, Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, Arizona, USA
- Structural Biology Research Unit, Department of Clinical Laboratory Sciences, University of Cape Town, Observatory, Cape Town, South Africa
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Natalya Yutin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - F Murilo Zerbini
- Departamento de Fitopatologia/Bioagro, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Jens H Kuhn
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Frederick, Maryland, USA
| |
Collapse
|
15
|
Maistrenko OM, Mende DR, Luetge M, Hildebrand F, Schmidt TSB, Li SS, Rodrigues JFM, von Mering C, Pedro Coelho L, Huerta-Cepas J, Sunagawa S, Bork P. Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity. THE ISME JOURNAL 2020; 14:1247-1259. [PMID: 32047279 PMCID: PMC7174425 DOI: 10.1038/s41396-020-0600-z] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 01/21/2020] [Accepted: 01/27/2020] [Indexed: 12/04/2022]
Abstract
Microbial organisms inhabit virtually all environments and encompass a vast biological diversity. The pangenome concept aims to facilitate an understanding of diversity within defined phylogenetic groups. Hence, pangenomes are increasingly used to characterize the strain diversity of prokaryotic species. To understand the interdependence of pangenome features (such as the number of core and accessory genes) and to study the impact of environmental and phylogenetic constraints on the evolution of conspecific strains, we computed pangenomes for 155 phylogenetically diverse species (from ten phyla) using 7,000 high-quality genomes to each of which the respective habitats were assigned. Species habitat ubiquity was associated with several pangenome features. In particular, core-genome size was more important for ubiquity than accessory genome size. In general, environmental preferences had a stronger impact on pangenome evolution than phylogenetic inertia. Environmental preferences explained up to 49% of the variance for pangenome features, compared with 18% by phylogenetic inertia. This observation was robust when the dataset was extended to 10,100 species (59 phyla). The importance of environmental preferences was further accentuated by convergent evolution of pangenome features in a given habitat type across different phylogenetic clades. For example, the soil environment promotes expansion of pangenome size, while host-associated habitats lead to its reduction. Taken together, we explored the global principles of pangenome evolution, quantified the influence of habitat, and phylogenetic inertia on the evolution of pangenomes and identified criteria governing species ubiquity and habitat specificity.
Collapse
Affiliation(s)
- Oleksandr M Maistrenko
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
| | - Daniel R Mende
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Laboratory of Applied Evolutionary Biology, Department of Medical Microbiology, Academic Medical Centre, University of Amsterdam, Amsterdam, 1105 AZ, The Netherlands
| | - Mechthild Luetge
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Institute of Immunobiology, Kantonsspital St. Gallen, 9007, St. Gallen, Switzerland
| | - Falk Hildebrand
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Gut Microbes and Health, Quadram Institute Bioscience, Norwich, Norfolk, UK
- Digital Biology, Earlham Institute, Norwich, Norfolk, UK
| | - Thomas S B Schmidt
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
| | - Simone S Li
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
| | - João F Matias Rodrigues
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, CH-8057, Zurich, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, CH-8057, Zurich, Switzerland
| | - Luis Pedro Coelho
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433, China
| | - Jaime Huerta-Cepas
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain
| | - Shinichi Sunagawa
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Department of Biology and Swiss Institute of Bioinformatics, ETH Zürich, Vladimir-Prelog-Weg 4, 8093, Zürich, Switzerland
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany.
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany.
- Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory, Heidelberg, Germany.
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
16
|
Médigue C, Calteau A, Cruveiller S, Gachet M, Gautreau G, Josso A, Lajus A, Langlois J, Pereira H, Planel R, Roche D, Rollin J, Rouy Z, Vallenet D. MicroScope-an integrated resource for community expertise of gene functions and comparative analysis of microbial genomic and metabolic data. Brief Bioinform 2020; 20:1071-1084. [PMID: 28968784 PMCID: PMC6931091 DOI: 10.1093/bib/bbx113] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 07/17/2017] [Indexed: 12/11/2022] Open
Abstract
The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources.
Collapse
|
17
|
Gautreau G, Bazin A, Gachet M, Planel R, Burlot L, Dubois M, Perrin A, Médigue C, Calteau A, Cruveiller S, Matias C, Ambroise C, Rocha EPC, Vallenet D. PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph. PLoS Comput Biol 2020; 16:e1007732. [PMID: 32191703 PMCID: PMC7108747 DOI: 10.1371/journal.pcbi.1007732] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 03/31/2020] [Accepted: 02/12/2020] [Indexed: 12/21/2022] Open
Abstract
The use of comparative genomics for functional, evolutionary, and epidemiological studies requires methods to classify gene families in terms of occurrence in a given species. These methods usually lack multivariate statistical models to infer the partitions and the optimal number of classes and don't account for genome organization. We introduce a graph structure to model pangenomes in which nodes represent gene families and edges represent genomic neighborhood. Our method, named PPanGGOLiN, partitions nodes using an Expectation-Maximization algorithm based on multivariate Bernoulli Mixture Model coupled with a Markov Random Field. This approach takes into account the topology of the graph and the presence/absence of genes in pangenomes to classify gene families into persistent, cloud, and one or several shell partitions. By analyzing the partitioned pangenome graphs of isolate genomes from 439 species and metagenome-assembled genomes from 78 species, we demonstrate that our method is effective in estimating the persistent genome. Interestingly, it shows that the shell genome is a key element to understand genome dynamics, presumably because it reflects how genes present at intermediate frequencies drive adaptation of species, and its proportion in genomes is independent of genome size. The graph-based approach proposed by PPanGGOLiN is useful to depict the overall genomic diversity of thousands of strains in a compact structure and provides an effective basis for very large scale comparative genomics. The software is freely available at https://github.com/labgem/PPanGGOLiN.
Collapse
Affiliation(s)
- Guillaume Gautreau
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Adelme Bazin
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Mathieu Gachet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Rémi Planel
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Laura Burlot
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Mathieu Dubois
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Amandine Perrin
- Microbial Evolutionary Genomics, Institut Pasteur, CNRS, UMR3525, Paris, France
- Sorbonne Université, Collège doctoral, Paris, France
| | - Claudine Médigue
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Alexandra Calteau
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Stéphane Cruveiller
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| | - Catherine Matias
- Laboratoire de Probabilités, Statistique et Modélisation, Sorbonne Université, Université de Paris, Centre National de la Recherche Scientifique, Paris, France
| | - Christophe Ambroise
- Laboratoire de Mathématiques et Modélisation d’Evry, UMR CNRS 8071, Université d’Evry Val d’Essonne, Evry, France
| | - Eduardo P. C. Rocha
- Microbial Evolutionary Genomics, Institut Pasteur, CNRS, UMR3525, Paris, France
| | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry, France
| |
Collapse
|
18
|
Abstract
One of the most widely recognized features of biological systems is their modularity. The modules that constitute biological systems are said to be redeployed and combined across several conditions, thus acting as building blocks. In this work, we analyse to what extent are these building blocks reusable as compared with those found in randomized versions of a system. We develop a notion of decompositions of systems into phenotypic building blocks, which allows them to overlap while maximizing the number of times a building block is reused across several conditions. Different biological systems present building blocks whose reusability ranges from single use (e.g. condition specific) to constitutive, although their average reusability is not always higher than random equivalents of the system. These decompositions reveal a distinct distribution of building block sizes in real biological systems. This distribution stems, in part, from the peculiar usage pattern of the elements of biological systems, and constitutes a new angle to study the evolution of modularity.
Collapse
Affiliation(s)
- Victor Mireles
- 1 Department of Mathematics and Computer Science, Freie Universität Berlin , Berlin, Germany.,2 International Max Planck Research School for Computational Biology and Scientific Computing, Max Planck Institute for Molecular Genetics , Berlin , Germany
| | - Tim O F Conrad
- 1 Department of Mathematics and Computer Science, Freie Universität Berlin , Berlin, Germany
| |
Collapse
|
19
|
Mazzolini A, Grilli J, De Lazzari E, Osella M, Lagomarsino MC, Gherardi M. Zipf and Heaps laws from dependency structures in component systems. Phys Rev E 2018; 98:012315. [PMID: 30110773 DOI: 10.1103/physreve.98.012315] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Indexed: 06/08/2023]
Abstract
Complex natural and technological systems can be considered, on a coarse-grained level, as assemblies of elementary components: for example, genomes as sets of genes or texts as sets of words. On one hand, the joint occurrence of components emerges from architectural and specific constraints in such systems. On the other hand, general regularities may unify different systems, such as the broadly studied Zipf and Heaps laws, respectively concerning the distribution of component frequencies and their number as a function of system size. Dependency structures (i.e., directed networks encoding the dependency relations between the components in a system) were proposed recently as a possible organizing principles underlying some of the regularities observed. However, the consequences of this assumption were explored only in binary component systems, where solely the presence or absence of components is considered, and multiple copies of the same component are not allowed. Here we consider a simple model that generates, from a given ensemble of dependency structures, a statistical ensemble of sets of components, allowing for components to appear with any multiplicity. Our model is a minimal extension that is memoryless and therefore accessible to analytical calculations. A mean-field analytical approach (analogous to the "Zipfian ensemble" in the linguistics literature) captures the relevant laws describing the component statistics as we show by comparison with numerical computations. In particular, we recover a power-law Zipf rank plot, with a set of core components, and a Heaps law displaying three consecutive regimes (linear, sublinear, and saturating) that we characterize quantitatively.
Collapse
Affiliation(s)
- Andrea Mazzolini
- Dipartimento di Fisica and INFN, Università degli Studi di Torino, Via Pietro Giuria 1, 10125 Torino, Italy
| | - Jacopo Grilli
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA
| | - Eleonora De Lazzari
- Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 4 Place Jussieu, Paris, France
| | - Matteo Osella
- Dipartimento di Fisica and INFN, Università degli Studi di Torino, Via Pietro Giuria 1, 10125 Torino, Italy
| | - Marco Cosentino Lagomarsino
- Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 4 Place Jussieu, Paris, France
- CNRS, UMR 7238, Paris, France
- IFOM, Milan, Italy
| | - Marco Gherardi
- Sorbonne Universités, UPMC Univ Paris 06, UMR 7238, Computational and Quantitative Biology, 4 Place Jussieu, Paris, France
- Dipartimento di Fisica, Università degli Studi di Milano, via Celoria 16, 20133 Milano, Italy
| |
Collapse
|
20
|
Corander J, Fraser C, Gutmann MU, Arnold B, Hanage WP, Bentley SD, Lipsitch M, Croucher NJ. Frequency-dependent selection in vaccine-associated pneumococcal population dynamics. Nat Ecol Evol 2017; 1:1950-1960. [PMID: 29038424 PMCID: PMC5708525 DOI: 10.1038/s41559-017-0337-x] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 09/01/2017] [Indexed: 12/21/2022]
Abstract
Many bacterial species are composed of multiple lineages distinguished by extensive variation in gene content. These often cocirculate in the same habitat, but the evolutionary and ecological processes that shape these complex populations are poorly understood. Addressing these questions is particularly important for Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen, because the changes in population structure associated with the recent introduction of partial-coverage vaccines have substantially reduced pneumococcal disease. Here we show that pneumococcal lineages from multiple populations each have a distinct combination of intermediate-frequency genes. Functional analysis suggested that these loci may be subject to negative frequency-dependent selection (NFDS) through interactions with other bacteria, hosts or mobile elements. Correspondingly, these genes had similar frequencies in four populations with dissimilar lineage compositions. These frequencies were maintained following substantial alterations in lineage prevalences once vaccination programmes began. Fitting a multilocus NFDS model of post-vaccine population dynamics to three genomic datasets using Approximate Bayesian Computation generated reproducible estimates of the influence of NFDS on pneumococcal evolution, the strength of which varied between loci. Simulations replicated the stable frequency of lineages unperturbed by vaccination, patterns of serotype switching and clonal replacement. This framework highlights how bacterial ecology affects the impact of clinical interventions.
Collapse
Affiliation(s)
- Jukka Corander
- Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, 00014, Helsinki, Finland
- Department of Biostatistics, University of Oslo, 0317, Oslo, Norway
- Infection Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Christophe Fraser
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7LF, UK
| | - Michael U Gutmann
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK
| | - Brian Arnold
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - William P Hanage
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Stephen D Bentley
- Infection Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Marc Lipsitch
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
- Departments of Epidemiology and Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Nicholas J Croucher
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, W2 1PG, UK.
| |
Collapse
|
21
|
Apagyi KJ, Fraser C, Croucher NJ. Transformation Asymmetry and the Evolution of the Bacterial Accessory Genome. Mol Biol Evol 2017; 35:575-581. [PMID: 29211859 PMCID: PMC5850275 DOI: 10.1093/molbev/msx309] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Bacterial transformation can insert or delete genomic islands (GIs), depending on the donor and recipient genotypes, if an homologous recombination spans the GI’s integration site and includes sufficiently long flanking homologous arms. Combining mathematical models of recombination with experiments using pneumococci found GI insertion rates declined geometrically with the GI’s size. The decrease in acquisition frequency with length (1.08×10−3 bp−1) was higher than a previous estimate of the analogous rate at which core genome recombinations terminated. Although most efficient for shorter GIs, transformation-mediated deletion frequencies did not vary consistently with GI length, with removal of 10-kb GIs ∼50% as efficient as acquisition of base substitutions. Fragments of 2 kb, typical of transformation event sizes, could drive all these deletions independent of island length. The strong asymmetry of transformation, and its capacity to efficiently remove GIs, suggests nonmobile accessory loci will decline in frequency without preservation by selection.
Collapse
Affiliation(s)
- Katinka J Apagyi
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Christophe Fraser
- Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Nicholas J Croucher
- MRC Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| |
Collapse
|
22
|
Bolotin E, Hershberg R. Horizontally Acquired Genes Are Often Shared between Closely Related Bacterial Species. Front Microbiol 2017; 8:1536. [PMID: 28890711 PMCID: PMC5575156 DOI: 10.3389/fmicb.2017.01536] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 07/28/2017] [Indexed: 01/11/2023] Open
Abstract
Horizontal gene transfer (HGT) serves as an important source of innovation for bacterial species. We used a pangenome-based approach to identify genes that were horizontally acquired by four closely related bacterial species, belonging to the Enterobacteriaceae family. This enabled us to examine the extent to which such closely related species tend to share horizontally acquired genes. We find that a high percent of horizontally acquired genes are shared among these closely related species. Furthermore, we demonstrate that the extent of sharing of horizontally acquired genes among these four closely related species is predictive of the extent to which these genes will be found in additional bacterial species. Finally, we show that acquired genes shared by more species tend to be better optimized for expression within the genomes of their new hosts. Combined, our results demonstrate the existence of a large pool of frequently horizontally acquired genes that have distinct characteristics from horizontally acquired genes that are less frequently shared between species.
Collapse
Affiliation(s)
- Evgeni Bolotin
- Rachel and Menachem Mendelovitch Evolutionary Processes of Mutation and Natural Selection Research Laboratory, The Rappaport Family Institute for Research in the Medical Sciences, Department of Genetics and Developmental Biology, Technion-Israel Institute of TechnologyHaifa, Israel
| | - Ruth Hershberg
- Rachel and Menachem Mendelovitch Evolutionary Processes of Mutation and Natural Selection Research Laboratory, The Rappaport Family Institute for Research in the Medical Sciences, Department of Genetics and Developmental Biology, Technion-Israel Institute of TechnologyHaifa, Israel
| |
Collapse
|
23
|
Marttinen P, Hanage WP. Speciation trajectories in recombining bacterial species. PLoS Comput Biol 2017; 13:e1005640. [PMID: 28671999 PMCID: PMC5542674 DOI: 10.1371/journal.pcbi.1005640] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Revised: 08/03/2017] [Accepted: 06/15/2017] [Indexed: 01/26/2023] Open
Abstract
It is generally agreed that bacterial diversity can be classified into genetically and ecologically cohesive units, but what produces such variation is a topic of intensive research. Recombination may maintain coherent species of frequently recombining bacteria, but the emergence of distinct clusters within a recombining species, and the impact of habitat structure in this process are not well described, limiting our understanding of how new species are created. Here we present a model of bacterial evolution in overlapping habitat space. We show that the amount of habitat overlap determines the outcome for a pair of clusters, which may range from fast clonal divergence with little interaction between the clusters to a stationary population structure, where different clusters maintain an equilibrium distance between each other for an indefinite time. We fit our model to two data sets. In Streptococcus pneumoniae, we find a genomically and ecologically distinct subset, held at a relatively constant genetic distance from the majority of the population through frequent recombination with it, while in Campylobacter jejuni, we find a minority population we predict will continue to diverge at a higher rate. This approach may predict and define speciation trajectories in multiple bacterial species.
Collapse
Affiliation(s)
- Pekka Marttinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
| | - William P. Hanage
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
24
|
Disentangling the effects of selection and loss bias on gene dynamics. Proc Natl Acad Sci U S A 2017; 114:E5616-E5624. [PMID: 28652353 DOI: 10.1073/pnas.1704925114] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We combine mathematical modeling of genome evolution with comparative analysis of prokaryotic genomes to estimate the relative contributions of selection and intrinsic loss bias to the evolution of different functional classes of genes and mobile genetic elements (MGE). An exact solution for the dynamics of gene family size was obtained under a linear duplication-transfer-loss model with selection. With the exception of genes involved in information processing, particularly translation, which are maintained by strong selection, the average selection coefficient for most nonparasitic genes is low albeit positive, compatible with observed positive correlation between genome size and effective population size. Free-living microbes evolve under stronger selection for gene retention than parasites. Different classes of MGE show a broad range of fitness effects, from the nearly neutral transposons to prophages, which are actively eliminated by selection. Genes involved in antiparasite defense, on average, incur a fitness cost to the host that is at least as high as the cost of plasmids. This cost is probably due to the adverse effects of autoimmunity and curtailment of horizontal gene transfer caused by the defense systems and selfish behavior of some of these systems, such as toxin-antitoxin and restriction modification modules. Transposons follow a biphasic dynamics, with bursts of gene proliferation followed by decay in the copy number that is quantitatively captured by the model. The horizontal gene transfer to loss ratio, but not duplication to loss ratio, correlates with genome size, potentially explaining increased abundance of neutral and costly elements in larger genomes.
Collapse
|
25
|
Choudoir MJ, Panke-Buisse K, Andam CP, Buckley DH. Genome Surfing As Driver of Microbial Genomic Diversity. Trends Microbiol 2017; 25:624-636. [PMID: 28283403 DOI: 10.1016/j.tim.2017.02.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Revised: 02/03/2017] [Accepted: 02/10/2017] [Indexed: 01/20/2023]
Abstract
Historical changes in population size, such as those caused by demographic range expansions, can produce nonadaptive changes in genomic diversity through mechanisms such as gene surfing. We propose that demographic range expansion of a microbial population capable of horizontal gene exchange can result in genome surfing, a mechanism that can cause widespread increase in the pan-genome frequency of genes acquired by horizontal gene exchange. We explain that patterns of genetic diversity within Streptomyces are consistent with genome surfing, and we describe several predictions for testing this hypothesis both in Streptomyces and in other microorganisms.
Collapse
Affiliation(s)
- Mallory J Choudoir
- School of Integrative Plant Science, Cornell University, Ithaca, NY 14850 USA
| | - Kevin Panke-Buisse
- School of Integrative Plant Science, Cornell University, Ithaca, NY 14850 USA
| | - Cheryl P Andam
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham NH 03824, USA
| | - Daniel H Buckley
- School of Integrative Plant Science, Cornell University, Ithaca, NY 14850 USA.
| |
Collapse
|
26
|
Two fundamentally different classes of microbial genes. Nat Microbiol 2016; 2:16208. [PMID: 27819663 DOI: 10.1038/nmicrobiol.2016.208] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 09/20/2016] [Indexed: 01/15/2023]
Abstract
The evolution of bacterial and archaeal genomes is highly dynamic and involves extensive horizontal gene transfer and gene loss1-4. Furthermore, many microbial species appear to have open pangenomes, where each newly sequenced genome contains more than 10% ORFans, that is, genes without detectable homologues in other species5,6. Here, we report a quantitative analysis of microbial genome evolution by fitting the parameters of a simple, steady-state evolutionary model to the comparative genomic data on the gene content and gene order similarity between archaeal genomes. The results reveal two sharply distinct classes of microbial genes, one of which is characterized by effectively instantaneous gene replacement, and the other consists of genes with finite, distributed replacement rates. These findings imply a conservative estimate of the size of the prokaryotic genomic universe, which appears to consist of at least a billion distinct genes. Furthermore, the same distribution of constraints is shown to govern the evolution of gene complement and gene order, without the need to invoke long-range conservation or the selfish operon concept7.
Collapse
|
27
|
Ku C, Martin WF. A natural barrier to lateral gene transfer from prokaryotes to eukaryotes revealed from genomes: the 70 % rule. BMC Biol 2016; 14:89. [PMID: 27751184 PMCID: PMC5067920 DOI: 10.1186/s12915-016-0315-9] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Accepted: 09/28/2016] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND The literature harbors many claims for lateral gene transfer (LGT) from prokaryotes to eukaryotes. Such claims are typically founded in analyses of genome sequences. It is undisputed that many genes entered the eukaryotic lineage via the origin of mitochondria and the origin of plastids. Claims for lineage-specific LGT to eukaryotes outside the context of organelle origins and claims of continuous LGT to eukaryotic lineages are more problematic. If eukaryotes acquire genes from prokaryotes continuously during evolution, then sequenced eukaryote genomes should harbor evidence for recent LGT, like prokaryotic genomes do. RESULTS Here we devise an approach to investigate 30,358 eukaryotic sequences in the context of 1,035,375 prokaryotic homologs among 2585 phylogenetic trees containing homologs from prokaryotes and eukaryotes. Prokaryote genomes reflect a continuous process of gene acquisition and inheritance, with abundant recent acquisitions showing 80-100 % amino acid sequence identity to their phylogenetic sister-group homologs from other phyla. By contrast, eukaryote genomes show no evidence for either continuous or recent gene acquisitions from prokaryotes. We find that, in general, genes in eukaryotic genomes that share ≥70 % amino acid identity to prokaryotic homologs are genome-specific; that is, they are not found outside individual genome assemblies. CONCLUSIONS Our analyses indicate that eukaryotes do not acquire genes through continual LGT like prokaryotes do. We propose a 70 % rule: Coding sequences in eukaryotic genomes that share more than 70 % amino acid sequence identity to prokaryotic homologs are most likely assembly or annotation artifacts. The findings further uncover that the role of differential loss in eukaryote genome evolution has been vastly underestimated.
Collapse
Affiliation(s)
- Chuan Ku
- Institute of Molecular Evolution, Heinrich-Heine University, Düsseldorf, Germany.
| | - William F Martin
- Institute of Molecular Evolution, Heinrich-Heine University, Düsseldorf, Germany.
| |
Collapse
|
28
|
Bolotin E, Hershberg R. Bacterial intra-species gene loss occurs in a largely clocklike manner mostly within a pool of less conserved and constrained genes. Sci Rep 2016; 6:35168. [PMID: 27734920 PMCID: PMC5062063 DOI: 10.1038/srep35168] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 09/26/2016] [Indexed: 12/13/2022] Open
Abstract
Gene loss is a major contributor to the evolution of bacterial gene content. Gene loss may occur as a result of shifts in environment leading to changes in the intensity and/or directionality of selection applied for the maintenance of specific genes. Gene loss may also occur in a more neutral manner, when gene functions are lost that were not subject to strong selection to be maintained, irrespective of changes to environment. Here, we used a pangenome-based approach to investigate patterns of gene loss across 15 bacterial species. We demonstrate that gene loss tends to occur mostly within a pool of genes that are less constrained within species, even in those strains from which they are not lost, and less conserved across bacterial species. Our results indicate that shifts in selection, resulting from shifts in environment are not required to explain the majority of gene loss events occurring within a diverse collection of bacterial species. Caution should therefore be taken when attributing differences in gene content to differences in environment.
Collapse
Affiliation(s)
- Evgeni Bolotin
- Rachel & Menachem Mendelovitch Evolutionary Processes of Mutation & Natural Selection Research Laboratory, Department of Genetics and Developmental Biology, the Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 31096, Israel
| | - Ruth Hershberg
- Rachel & Menachem Mendelovitch Evolutionary Processes of Mutation & Natural Selection Research Laboratory, Department of Genetics and Developmental Biology, the Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 31096, Israel
| |
Collapse
|
29
|
Abstract
UNLABELLED Virus genomes are prone to extensive gene loss, gain, and exchange and share no universal genes. Therefore, in a broad-scale study of virus evolution, gene and genome network analyses can complement traditional phylogenetics. We performed an exhaustive comparative analysis of the genomes of double-stranded DNA (dsDNA) viruses by using the bipartite network approach and found a robust hierarchical modularity in the dsDNA virosphere. Bipartite networks consist of two classes of nodes, with nodes in one class, in this case genomes, being connected via nodes of the second class, in this case genes. Such a network can be partitioned into modules that combine nodes from both classes. The bipartite network of dsDNA viruses includes 19 modules that form 5 major and 3 minor supermodules. Of these modules, 11 include tailed bacteriophages, reflecting the diversity of this largest group of viruses. The module analysis quantitatively validates and refines previously proposed nontrivial evolutionary relationships. An expansive supermodule combines the large and giant viruses of the putative order "Megavirales" with diverse moderate-sized viruses and related mobile elements. All viruses in this supermodule share a distinct morphogenetic tool kit with a double jelly roll major capsid protein. Herpesviruses and tailed bacteriophages comprise another supermodule, held together by a distinct set of morphogenetic proteins centered on the HK97-like major capsid protein. Together, these two supermodules cover the great majority of currently known dsDNA viruses. We formally identify a set of 14 viral hallmark genes that comprise the hubs of the network and account for most of the intermodule connections. IMPORTANCE Viruses and related mobile genetic elements are the dominant biological entities on earth, but their evolution is not sufficiently understood and their classification is not adequately developed. The key reason is the characteristic high rate of virus evolution that involves not only sequence change but also extensive gene loss, gain, and exchange. Therefore, in the study of virus evolution on a large scale, traditional phylogenetic approaches have limited applicability and have to be complemented by gene and genome network analyses. We applied state-of-the art methods of such analysis to reveal robust hierarchical modularity in the genomes of double-stranded DNA viruses. Some of the identified modules combine highly diverse viruses infecting bacteria, archaea, and eukaryotes, in support of previous hypotheses on direct evolutionary relationships between viruses from the three domains of cellular life. We formally identify a set of 14 viral hallmark genes that hold together the genomic network.
Collapse
|
30
|
Zamani-Dahaj SA, Okasha M, Kosakowski J, Higgs PG. Estimating the Frequency of Horizontal Gene Transfer Using Phylogenetic Models of Gene Gain and Loss. Mol Biol Evol 2016; 33:1843-57. [DOI: 10.1093/molbev/msw062] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
31
|
Touchon M, Rocha EPC. Coevolution of the Organization and Structure of Prokaryotic Genomes. Cold Spring Harb Perspect Biol 2016; 8:a018168. [PMID: 26729648 DOI: 10.1101/cshperspect.a018168] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The cytoplasm of prokaryotes contains many molecular machines interacting directly with the chromosome. These vital interactions depend on the chromosome structure, as a molecule, and on the genome organization, as a unit of genetic information. Strong selection for the organization of the genetic elements implicated in these interactions drives replicon ploidy, gene distribution, operon conservation, and the formation of replication-associated traits. The genomes of prokaryotes are also very plastic with high rates of horizontal gene transfer and gene loss. The evolutionary conflicts between plasticity and organization lead to the formation of regions with high genetic diversity whose impact on chromosome structure is poorly understood. Prokaryotic genomes are remarkable documents of natural history because they carry the imprint of all of these selective and mutational forces. Their study allows a better understanding of molecular mechanisms, their impact on microbial evolution, and how they can be tinkered in synthetic biology.
Collapse
Affiliation(s)
- Marie Touchon
- Microbial Evolutionary Genomics, Institut Pasteur, 75015 Paris, France CNRS, UMR3525, 75015 Paris, France
| | - Eduardo P C Rocha
- Microbial Evolutionary Genomics, Institut Pasteur, 75015 Paris, France CNRS, UMR3525, 75015 Paris, France
| |
Collapse
|
32
|
Marttinen P, Croucher NJ, Gutmann MU, Corander J, Hanage WP. Recombination produces coherent bacterial species clusters in both core and accessory genomes. Microb Genom 2015; 1:e000038. [PMID: 28348822 PMCID: PMC5320679 DOI: 10.1099/mgen.0.000038] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 10/06/2015] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Population samples show bacterial genomes can be divided into a core of ubiquitous genes and accessory genes that are present in a fraction of isolates. The ecological significance of this variation in gene content remains unclear. However, microbiologists agree that a bacterial species should be 'genomically coherent', even though there is no consensus on how this should be determined. RESULTS We use a parsimonious model combining diversification in both the core and accessory genome, including mutation, homologous recombination (HR) and horizontal gene transfer (HGT) introducing new loci, to produce a population of interacting clusters of strains with varying genome content. New loci introduced by HGT may then be transferred on by HR. The model fits well to a systematic population sample of 616 pneumococcal genomes, capturing the major features of the population structure with parameter values that agree well with empirical estimates. CONCLUSIONS The model does not include explicit selection on individual genes, suggesting that crude comparisons of gene content may be a poor predictor of ecological function. We identify a clearly divergent subpopulation of pneumococci that are inconsistent with the model and may be considered genomically incoherent with the rest of the population. These strains have a distinct disease tropism and may be rationally defined as a separate species. We also find deviations from the model that may be explained by recent population bottlenecks or spatial structure.
Collapse
Affiliation(s)
- Pekka Marttinen
- Aalto University, Espoo, Finland
- Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA
| | | | | | | | - William P. Hanage
- Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA
| |
Collapse
|
33
|
Galardini M, Brilli M, Spini G, Rossi M, Roncaglia B, Bani A, Chiancianesi M, Moretto M, Engelen K, Bacci G, Pini F, Biondi EG, Bazzicalupo M, Mengoni A. Evolution of Intra-specific Regulatory Networks in a Multipartite Bacterial Genome. PLoS Comput Biol 2015; 11:e1004478. [PMID: 26340565 PMCID: PMC4560400 DOI: 10.1371/journal.pcbi.1004478] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 07/24/2015] [Indexed: 11/21/2022] Open
Abstract
Reconstruction of the regulatory network is an important step in understanding how organisms control the expression of gene products and therefore phenotypes. Recent studies have pointed out the importance of regulatory network plasticity in bacterial adaptation and evolution. The evolution of such networks within and outside the species boundary is however still obscure. Sinorhizobium meliloti is an ideal species for such study, having three large replicons, many genomes available and a significant knowledge of its transcription factors (TF). Each replicon has a specific functional and evolutionary mark; which might also emerge from the analysis of their regulatory signatures. Here we have studied the plasticity of the regulatory network within and outside the S. meliloti species, looking for the presence of 41 TFs binding motifs in 51 strains and 5 related rhizobial species. We have detected a preference of several TFs for one of the three replicons, and the function of regulated genes was found to be in accordance with the overall replicon functional signature: house-keeping functions for the chromosome, metabolism for the chromid, symbiosis for the megaplasmid. This therefore suggests a replicon-specific wiring of the regulatory network in the S. meliloti species. At the same time a significant part of the predicted regulatory network is shared between the chromosome and the chromid, thus adding an additional layer by which the chromid integrates itself in the core genome. Furthermore, the regulatory network distance was found to be correlated with both promoter regions and accessory genome evolution inside the species, indicating that both pangenome compartments are involved in the regulatory network evolution. We also observed that genes which are not included in the species regulatory network are more likely to belong to the accessory genome, indicating that regulatory interactions should also be considered to predict gene conservation in bacterial pangenomes. The influence of transcriptional regulatory networks on the evolution of bacterial pangenomes has not yet been elucidated, even though the role of transcriptional regulation is widely recognized. Using the model symbiont Sinorhizobium meliloti we have predicted the regulatory targets of 41 transcription factors in 51 strains and 5 other rhizobial species, showing a correlation between regulon diversity and pangenome evolution, through upstream sequence diversity and accessory genome composition. We have also shown that genes not wired to the regulatory network are more likely to belong to the accessory genome, thus suggesting that inclusion in the regulatory circuits may be an indicator of gene conservation. We have also highlighted a series of transcription factors that preferentially regulate genes belonging to one of the three replicons of this species, indicating the presence of replicon-specific regulatory modules, with peculiar functional signatures. At the same time the chromid shares a significant part of the regulatory network with the chromosome, indicating an additional way by which this replicon integrates itself in the pangenome.
Collapse
Affiliation(s)
- Marco Galardini
- Department of Biology, University of Florence, Florence, Italy
| | - Matteo Brilli
- Department of Genomics and Biology of Fruit Crops, Research and Innovation Centre, Fondazione Edmund Mach (FEM), San Michele all’Adige, Italy
| | - Giulia Spini
- Dipartimento di Biotecnologie Agrarie, Sezione di Microbiologia, University of Florence, Florence, Italy
| | - Matteo Rossi
- Department of Biology, University of Florence, Florence, Italy
| | | | - Alessia Bani
- Department of Biology, University of Florence, Florence, Italy
| | | | - Marco Moretto
- Department of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach (FEM), San Michele all’Adige, Italy
| | - Kristof Engelen
- Department of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach (FEM), San Michele all’Adige, Italy
| | - Giovanni Bacci
- Department of Biology, University of Florence, Florence, Italy
- Consiglio per la Ricerca e la Sperimentazione in Agricoltura, Centro di Ricerca per lo Studio delle Relazioni tra Pianta e Suolo (CRA-RPS), Rome, Italy
| | - Francesco Pini
- Interdisciplinary Research Institute USR3078, CNRS-Universit Lille Nord de France, Villeneuve d’Ascq, France
| | - Emanuele G. Biondi
- Interdisciplinary Research Institute USR3078, CNRS-Universit Lille Nord de France, Villeneuve d’Ascq, France
| | | | - Alessio Mengoni
- Department of Biology, University of Florence, Florence, Italy
- * E-mail:
| |
Collapse
|
34
|
Ku C, Nelson-Sathi S, Roettger M, Sousa FL, Lockhart PJ, Bryant D, Hazkani-Covo E, McInerney JO, Landan G, Martin WF. Endosymbiotic origin and differential loss of eukaryotic genes. Nature 2015; 524:427-32. [PMID: 26287458 DOI: 10.1038/nature14963] [Citation(s) in RCA: 191] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2015] [Accepted: 07/20/2015] [Indexed: 01/11/2023]
Abstract
Chloroplasts arose from cyanobacteria, mitochondria arose from proteobacteria. Both organelles have conserved their prokaryotic biochemistry, but their genomes are reduced, and most organelle proteins are encoded in the nucleus. Endosymbiotic theory posits that bacterial genes in eukaryotic genomes entered the eukaryotic lineage via organelle ancestors. It predicts episodic influx of prokaryotic genes into the eukaryotic lineage, with acquisition corresponding to endosymbiotic events. Eukaryotic genome sequences, however, increasingly implicate lateral gene transfer, both from prokaryotes to eukaryotes and among eukaryotes, as a source of gene content variation in eukaryotic genomes, which predicts continuous, lineage-specific acquisition of prokaryotic genes in divergent eukaryotic groups. Here we discriminate between these two alternatives by clustering and phylogenetic analysis of eukaryotic gene families having prokaryotic homologues. Our results indicate (1) that gene transfer from bacteria to eukaryotes is episodic, as revealed by gene distributions, and coincides with major evolutionary transitions at the origin of chloroplasts and mitochondria; (2) that gene inheritance in eukaryotes is vertical, as revealed by extensive topological comparison, sparse gene distributions stemming from differential loss; and (3) that continuous, lineage-specific lateral gene transfer, although it sometimes occurs, does not contribute to long-term gene content evolution in eukaryotic genomes.
Collapse
Affiliation(s)
- Chuan Ku
- Institute of Molecular Evolution, Heinrich-Heine University, 40225 Düsseldorf, Germany
| | - Shijulal Nelson-Sathi
- Institute of Molecular Evolution, Heinrich-Heine University, 40225 Düsseldorf, Germany
| | - Mayo Roettger
- Institute of Molecular Evolution, Heinrich-Heine University, 40225 Düsseldorf, Germany
| | - Filipa L Sousa
- Institute of Molecular Evolution, Heinrich-Heine University, 40225 Düsseldorf, Germany
| | - Peter J Lockhart
- Institute of Fundamental Sciences, Massey University, Palmerston North 4474, New Zealand
| | - David Bryant
- Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand
| | - Einat Hazkani-Covo
- Department of Natural and Life Sciences, The Open University of Israel, Ra'anana 43107, Israel
| | - James O McInerney
- Department of Biology, National University of Ireland, Maynooth, County Kildare, Ireland.,Michael Smith Building, The University of Manchester, Oxford Rd, Manchester M13 9PL, UK
| | - Giddy Landan
- Genomic Microbiology Group, Institute of Microbiology, Christian-Albrechts-University of Kiel, 24118 Kiel, Germany
| | - William F Martin
- Institute of Molecular Evolution, Heinrich-Heine University, 40225 Düsseldorf, Germany.,Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, 2780-157 Oeiras, Portugal
| |
Collapse
|
35
|
Fullmer MS, Soucy SM, Gogarten JP. The pan-genome as a shared genomic resource: mutual cheating, cooperation and the black queen hypothesis. Front Microbiol 2015; 6:728. [PMID: 26284032 PMCID: PMC4523029 DOI: 10.3389/fmicb.2015.00728] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 07/03/2015] [Indexed: 11/13/2022] Open
Affiliation(s)
- Matthew S Fullmer
- Department of Molecular and Cell Biology, University of Connecticut Storrs, CT, USA
| | - Shannon M Soucy
- Department of Molecular and Cell Biology, University of Connecticut Storrs, CT, USA
| | - Johann Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut Storrs, CT, USA ; Institute for Systems Genomics, University of Connecticut Storrs, CT, USA
| |
Collapse
|
36
|
Bolotin E, Hershberg R. Gene Loss Dominates As a Source of Genetic Variation within Clonal Pathogenic Bacterial Species. Genome Biol Evol 2015; 7:2173-87. [PMID: 26163675 PMCID: PMC4558853 DOI: 10.1093/gbe/evv135] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Some of the most dangerous pathogens such as Mycobacterium tuberculosis and Yersinia pestis evolve clonally. This means that little or no recombination occurs between strains belonging to these species. Paradoxically, although different members of these species show extreme sequence similarity of orthologous genes, some show considerable intraspecies phenotypic variation, the source of which remains elusive. To examine the possible sources of phenotypic variation within clonal pathogenic bacterial species, we carried out an extensive genomic and pan-genomic analysis of the sources of genetic variation available to a large collection of clonal and nonclonal pathogenic bacterial species. We show that while nonclonal species diversify through a combination of changes to gene sequences, gene loss and gene gain, gene loss completely dominates as a source of genetic variation within clonal species. Indeed, gene loss is so prevalent within clonal species as to lead to levels of gene content variation comparable to those found in some nonclonal species that are much more diverged in their gene sequences and that acquire a substantial number of genes horizontally. Gene loss therefore needs to be taken into account as a potential dominant source of phenotypic variation within clonal bacterial species.
Collapse
Affiliation(s)
- Evgeni Bolotin
- Rachel & Menachem Mendelovitch Evolutionary Processes of Mutation & Natural Selection Research Laboratory, Department of Genetics and Developmental Biology, The Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel
| | - Ruth Hershberg
- Rachel & Menachem Mendelovitch Evolutionary Processes of Mutation & Natural Selection Research Laboratory, Department of Genetics and Developmental Biology, The Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa, Israel
| |
Collapse
|
37
|
Koonin EV. The Turbulent Network Dynamics of Microbial Evolution and the Statistical Tree of Life. J Mol Evol 2015; 80:244-50. [PMID: 25894542 PMCID: PMC4472940 DOI: 10.1007/s00239-015-9679-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 04/08/2015] [Indexed: 08/30/2023]
Abstract
The wide spread and high rate of gene exchange and loss in the prokaryotic world translate into “network genomics”. The rates of gene gain and loss are comparable with the rate of point mutations but are substantially greater than the duplication rate. Thus, evolution of prokaryotes is primarily shaped by gene gain and loss. These processes are essential to prevent mutational meltdown of microbial populations by stopping Muller’s ratchet and appear to trigger emergence of major novel clades by opening up new ecological niches. At least some bacteria and archaea seem to have evolved dedicated devices for gene transfer. Despite the dominance of gene gain and loss, evolution of genes is intrinsically tree-like. The significant coherence between the topologies of numerous gene trees, particularly those for (nearly) universal genes, is compatible with the concept of a statistical tree of life, which forms the framework for reconstruction of the evolutionary processes in the prokaryotic world.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA,
| |
Collapse
|
38
|
Baumdicker F. The site frequency spectrum of dispensable genes. Theor Popul Biol 2015; 100C:13-25. [DOI: 10.1016/j.tpb.2014.12.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Revised: 11/28/2014] [Accepted: 12/02/2014] [Indexed: 10/24/2022]
|
39
|
Lobkovsky AE, Wolf YI, Koonin EV. Estimation of prokaryotic supergenome size and composition from gene frequency distributions. BMC Genomics 2014; 15 Suppl 6:S14. [PMID: 25572821 PMCID: PMC4240607 DOI: 10.1186/1471-2164-15-s6-s14] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Because prokaryotic genomes experience a rapid flux of genes, selection may act at a higher level than an individual genome. We explore a quantitative model of the distributed genome whereby groups of genomes evolve by acquiring genes from a fixed reservoir which we denote as supergenome. Previous attempts to understand the nature of the supergenome treated genomes as random, independent collections of genes and assumed that the supergenome consists of a small number of homogeneous sub-reservoirs. Here we explore the consequences of relaxing both assumptions. RESULTS We surveyed several methods for estimating the size and composition of the supergenome. The methods assumed that genomes were either random, independent samples of the supergenome or that they evolved from a common ancestor along a known tree via stochastic sampling from the reservoir. The reservoir was assumed to be either a collection of homogeneous sub-reservoirs or alternatively composed of genes with Gamma distributed gain probabilities. Empirical gene frequencies were used to either compute the likelihood of the data directly or first to reconstruct the history of gene gains and then compute the likelihood of the reconstructed numbers of gains. CONCLUSIONS Supergenome size estimates using the empirical gene frequencies directly are not robust with respect to the choice of the model. By contrast, using the gene frequencies and the phylogenetic tree to reconstruct multiple gene gains produces reliable estimates of the supergenome size and indicates that a homogeneous supergenome is more consistent with the data than a supergenome with Gamma distributed gain probabilities.
Collapse
|
40
|
Grilli J, Romano M, Bassetti F, Cosentino Lagomarsino M. Cross-species gene-family fluctuations reveal the dynamics of horizontal transfers. Nucleic Acids Res 2014; 42:6850-60. [PMID: 24829449 PMCID: PMC4066789 DOI: 10.1093/nar/gku378] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Prokaryotes vary their protein repertoire mainly through horizontal transfer and gene loss. To elucidate the links between these processes and the cross-species gene-family statistics, we perform a large-scale data analysis of the cross-species variability of gene-family abundance (the number of members of the family found on a given genome). We find that abundance fluctuations are related to the rate of horizontal transfers. This is rationalized by a minimal theoretical model, which predicts this link. The families that are not captured by the model show abundance profiles that are markedly peaked around a mean value, possibly because of specific abundance selection. Based on these results, we define an abundance variability index that captures a family's evolutionary behavior (and thus some of its relevant functional properties) purely based on its cross-species abundance fluctuations. Analysis and model, combined, show a quantitative link between cross-species family abundance statistics and horizontal transfer dynamics, which can be used to analyze genome ‘flux’. Groups of families with different values of the abundance variability index correspond to genome sub-parts having different plasticity in terms of the level of horizontal exchange allowed by natural selection.
Collapse
Affiliation(s)
- Jacopo Grilli
- Dipartimento di Fisica e Astronomia "G. Galilei", Università di Padova, Via Marzolo 8, I-35131 Padova, Italy
| | - Mariacristina Romano
- Dipartimento di Fisica, Università degli Studi di Milano, via Celoria, 16, 20133 Milano, Italy
| | - Federico Bassetti
- Università di Pavia, Dipartimento di Matematica, via Ferrata 1, 27100 Pavia, Italy
| | - Marco Cosentino Lagomarsino
- CNRS, UMR 7238, Paris, France Sorbonne Universités, UPMC Université Paris 06, UMR 7238 Computational and Quantitative Biology, Genomic Physics Group, 15 rue de l'École de Médecine, Paris, France
| |
Collapse
|
41
|
Epstein B, Sadowsky MJ, Tiffin P. Selection on horizontally transferred and duplicated genes in sinorhizobium (ensifer), the root-nodule symbionts of medicago. Genome Biol Evol 2014; 6:1199-209. [PMID: 24803571 PMCID: PMC4040998 DOI: 10.1093/gbe/evu090] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Structural variation, including variation in gene copy number and presence or absence of genes, is a widespread and important source of genomic variation. We used whole-genome DNA sequences from 48 strains of Sinorhizobium (recently renamed Ensifer), including 20 strains of Sinorhizobium meliloti and 12 strains of S. medicae that were the focus of the analyses, to study the fitness effects of new structural variants created by duplication and horizontal gene transfer. We find that derived duplicated and horizontally transferred (HT) genes segregate at lower frequency than synonymous and nonsynonymous nucleotide variants in S. meliloti and S. medicae. Furthermore, the relative frequencies of different types of variants are more similar in S. medicae than in S. meliloti, the species with the larger effective population size. These results are consistent with the hypothesis that most duplications and HT genes have deleterious effects. Diversity of duplications, as measured by segregating duplicated genes per gene, is greater than nucleotide diversity, consistent with a high rate of duplication. Our results suggest that the vast majority of structural variants found among closely related bacterial strains are short-lived and unlikely to be involved in species-wide adaptation.
Collapse
Affiliation(s)
- Brendan Epstein
- Department of Plant Biology, University of MinnesotaSchool of Biological Sciences, Washington State University
| | - Michael J Sadowsky
- Department of Soil, Water, and Climate, University of MinnesotaBioTechnology Institute, Saint Paul, MN
| | - Peter Tiffin
- Department of Plant Biology, University of Minnesota
| |
Collapse
|
42
|
Baumdicker F, Pfaffelhuber P. The infinitely many genes model with horizontal gene transfer. ELECTRON J PROBAB 2014. [DOI: 10.1214/ejp.v19-2642] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
43
|
Warnow T. Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|