1
|
Dewar AE, Hao C, Belcher LJ, Ghoul M, West SA. Bacterial lifestyle shapes pangenomes. Proc Natl Acad Sci U S A 2024; 121:e2320170121. [PMID: 38743630 PMCID: PMC11126918 DOI: 10.1073/pnas.2320170121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 04/06/2024] [Indexed: 05/16/2024] Open
Abstract
Pangenomes vary across bacteria. Some species have fluid pangenomes, with a high proportion of genes varying between individual genomes. Other species have less fluid pangenomes, with different genomes tending to contain the same genes. Two main hypotheses have been suggested to explain this variation: differences in species' bacterial lifestyle and effective population size. However, previous studies have not been able to test between these hypotheses because the different features of lifestyle and effective population size are highly correlated with each other, and phylogenetically conserved, making it hard to disentangle their relative importance. We used phylogeny-based analyses, across 126 bacterial species, to tease apart the causal role of different factors. We found that pangenome fluidity was lower in i) host-associated compared with free-living species and ii) host-associated species that are obligately dependent on a host, live inside cells, and are more pathogenic and less motile. In contrast, we found no support for the competing hypothesis that larger effective population sizes lead to more fluid pangenomes. Effective population size appears to correlate with pangenome variation because it is also driven by bacterial lifestyle, rather than because of a causal relationship.
Collapse
Affiliation(s)
- Anna E. Dewar
- Department of Biology, University of Oxford, OxfordOX1 3SZ, United Kingdom
| | - Chunhui Hao
- Department of Biology, University of Oxford, OxfordOX1 3SZ, United Kingdom
| | | | - Melanie Ghoul
- Department of Biology, University of Oxford, OxfordOX1 3SZ, United Kingdom
| | - Stuart A. West
- Department of Biology, University of Oxford, OxfordOX1 3SZ, United Kingdom
| |
Collapse
|
2
|
He L, Huang R, Chen H, Zhao L, Zhang Z. Discovery and characterization of a novel pathogen Erwinia pyri sp. nov. associated with pear dieback: taxonomic insights and genomic analysis. Front Microbiol 2024; 15:1365685. [PMID: 38784818 PMCID: PMC11111954 DOI: 10.3389/fmicb.2024.1365685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 04/08/2024] [Indexed: 05/25/2024] Open
Abstract
In 2022, a novel disease similar to pear fire blight was found in a pear orchard in Zhangye City, Gansu Province, China. The disease mainly damages the branches, leaves, and fruits of the plant. To identify the pathogen, tissue isolation and pathogenicity testing (inoculating the potential pathogen on healthy plant tissues) were conducted. Furthermore, a comprehensive analysis encompassing the pathogen's morphological, physiological, and biochemical characteristics and whole-genome sequencing was conducted. The results showed that among the eight isolates, the symptoms on the detached leaves and fruits inoculated with isolate DE2 were identical to those observed in the field. Verifying Koch's postulates confirmed that DE2 was the pathogenic bacterium that causes the disease. Based on a 16S rRNA phylogenetic tree, isolate DE2 belongs to the genus Erwinia. Biolog and API 20E results also indicated that isolate DE2 is an undescribed species of Erwinia. Isolate DE2 was negative for oxidase. Subsequently, the complete genome sequence of isolate DE2 was determined and compared to the complete genome sequences of 29 other Erwinia species based on digital DNA-DNA hybridization (dDDH) and average nucleotide identity (ANI) analyses. The ANI and dDDH values between strain DE2 and Erwinia species were both below the species thresholds (ANI < 95-96%, dDDH<70%), suggesting that isolate DE2 is a new species of Erwinia. We will temporarily name strain DE2 as Erwinia pyri sp. nov. There were 548 predicted virulence factors in the genome of strain DE2, comprising 534 on the chromosome and 5 in the plasmids. The whole genome sequence of strain DE2 has been submitted to the NCBI database (ASM3075845v1) with accession number GCA_030758455.1. The strain DE2 has been preserved at the China Center for Type Culture Collection (CCTCC) under the deposit number CCTCC AB 2024080. This study represents the initial report of a potentially new bacterial species in the genus Erwinia that causes a novel pear dieback disease. The findings provide a valuable strain resource for the study of the genus Erwinia and establish a robust theoretical foundation for the prevention and control of emerging pear dieback diseases.
Collapse
Affiliation(s)
| | | | | | | | - Zhenfen Zhang
- Key Laboratory of Grassland Ecosystem, Ministry of Education, Sino-U.S. Centers for Grazing Land Ecosystem Sustainability, Ministry of Science and Technology, Pratacultural College, Gansu Agricultural University, Lanzhou, China
| |
Collapse
|
3
|
Dmitrijeva M, Tackmann J, Matias Rodrigues JF, Huerta-Cepas J, Coelho LP, von Mering C. A global survey of prokaryotic genomes reveals the eco-evolutionary pressures driving horizontal gene transfer. Nat Ecol Evol 2024; 8:986-998. [PMID: 38443606 DOI: 10.1038/s41559-024-02357-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 02/05/2024] [Indexed: 03/07/2024]
Abstract
Horizontal gene transfer, the exchange of genetic material through means other than reproduction, is a fundamental force in prokaryotic genome evolution. Genomic persistence of horizontally transferred genes has been shown to be influenced by both ecological and evolutionary factors. However, there is limited availability of ecological information about species other than the habitats from which they were isolated, which has prevented a deeper exploration of ecological contributions to horizontal gene transfer. Here we focus on transfers detected through comparison of individual gene trees to the species tree, assessing the distribution of gene-exchanging prokaryotes across over a million environmental sequencing samples. By analysing detected horizontal gene transfer events, we show distinct functional profiles for recent versus old events. Although most genes transferred are part of the accessory genome, genes transferred earlier in evolution tend to be more ubiquitous within present-day species. We find that co-occurring, interacting and high-abundance species tend to exchange more genes. Finally, we show that host-associated specialist species are most likely to exchange genes with other host-associated specialist species, whereas species found across different habitats have similar gene exchange rates irrespective of their preferred habitat. Our study covers an unprecedented scale of integrated horizontal gene transfer and environmental information, highlighting broad eco-evolutionary trends.
Collapse
Affiliation(s)
- Marija Dmitrijeva
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zürich, Zurich, Switzerland
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zurich, Switzerland
| | - Janko Tackmann
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zürich, Zurich, Switzerland
| | | | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM)-Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, Spain
| | - Luis Pedro Coelho
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia.
| | - Christian von Mering
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zürich, Zurich, Switzerland.
| |
Collapse
|
4
|
Kehlet-Delgado H, Montoya AP, Jensen KT, Wendlandt CE, Dexheimer C, Roberts M, Torres Martínez L, Friesen ML, Griffitts JS, Porter SS. The evolutionary genomics of adaptation to stress in wild rhizobium bacteria. Proc Natl Acad Sci U S A 2024; 121:e2311127121. [PMID: 38507447 PMCID: PMC10990125 DOI: 10.1073/pnas.2311127121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 02/08/2024] [Indexed: 03/22/2024] Open
Abstract
Microbiota comprise the bulk of life's diversity, yet we know little about how populations of microbes accumulate adaptive diversity across natural landscapes. Adaptation to stressful soil conditions in plants provides seminal examples of adaptation in response to natural selection via allelic substitution. For microbes symbiotic with plants however, horizontal gene transfer allows for adaptation via gene gain and loss, which could generate fundamentally different evolutionary dynamics. We use comparative genomics and genetics to elucidate the evolutionary mechanisms of adaptation to physiologically stressful serpentine soils in rhizobial bacteria in western North American grasslands. In vitro experiments demonstrate that the presence of a locus of major effect, the nre operon, is necessary and sufficient to confer adaptation to nickel, a heavy metal enriched to toxic levels in serpentine soil, and a major axis of environmental soil chemistry variation. We find discordance between inferred evolutionary histories of the core genome and nreAXY genes, which often reside in putative genomic islands. This suggests that the evolutionary history of this adaptive variant is marked by frequent losses, and/or gains via horizontal acquisition across divergent rhizobium clades. However, different nre alleles confer distinct levels of nickel resistance, suggesting allelic substitution could also play a role in rhizobium adaptation to serpentine soil. These results illustrate that the interplay between evolution via gene gain and loss and evolution via allelic substitution may underlie adaptation in wild soil microbiota. Both processes are important to consider for understanding adaptive diversity in microbes and improving stress-adapted microbial inocula for human use.
Collapse
Affiliation(s)
| | | | - Kyson T. Jensen
- Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT84602
| | | | | | - Miles Roberts
- School of Biological Sciences, Washington State University, Vancouver, WA98686
| | | | - Maren L. Friesen
- Department of Plant Pathology, Washington State University, Pullman, WA99164
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA99164
| | - Joel S. Griffitts
- Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT84602
| | - Stephanie S. Porter
- School of Biological Sciences, Washington State University, Vancouver, WA98686
| |
Collapse
|
5
|
Cooper AL, Low A, Wong A, Tamber S, Blais BW, Carrillo CD. Modeling the limits of detection for antimicrobial resistance genes in agri-food samples: a comparative analysis of bioinformatics tools. BMC Microbiol 2024; 24:31. [PMID: 38245666 PMCID: PMC10799530 DOI: 10.1186/s12866-023-03148-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 12/07/2023] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Although the spread of antimicrobial resistance (AMR) through food and its production poses a significant concern, there is limited research on the prevalence of AMR bacteria in various agri-food products. Sequencing technologies are increasingly being used to track the spread of AMR genes (ARGs) in bacteria, and metagenomics has the potential to bypass some of the limitations of single isolate characterization by allowing simultaneous analysis of the agri-food product microbiome and associated resistome. However, metagenomics may still be hindered by methodological biases, presence of eukaryotic DNA, and difficulties in detecting low abundance targets within an attainable sequence coverage. The goal of this study was to assess whether limits of detection of ARGs in agri-food metagenomes were influenced by sample type and bioinformatic approaches. RESULTS We simulated metagenomes containing different proportions of AMR pathogens and analysed them for taxonomic composition and ARGs using several common bioinformatic tools. Kraken2/Bracken estimates of species abundance were closest to expected values. However, analysis by both Kraken2/Bracken indicated presence of organisms not included in the synthetic metagenomes. Metaphlan3/Metaphlan4 analysis of community composition was more specific but with lower sensitivity than the Kraken2/Bracken analysis. Accurate detection of ARGs dropped drastically below 5X isolate genome coverage. However, it was sometimes possible to detect ARGs and closely related alleles at lower coverage levels if using a lower ARG-target coverage cutoff (< 80%). While KMA and CARD-RGI only predicted presence of expected ARG-targets or closely related gene-alleles, SRST2 (which allows read to map to multiple targets) falsely reported presence of distantly related ARGs at all isolate genome coverage levels. The presence of background microbiota in metagenomes influenced the accuracy of ARG detection by KMA, resulting in mcr-1 detection at 0.1X isolate coverage in the lettuce but not in the beef metagenome. CONCLUSIONS This study demonstrates accurate detection of ARGs in synthetic metagenomes using various bioinformatic methods, provided that reads from the ARG-encoding organism exceed approximately 5X isolate coverage (i.e. 0.4% of a 40 million read metagenome). While lowering thresholds for target gene detection improved sensitivity, this led to the identification of alternative ARG-alleles, potentially confounding the identification of critical ARGs in the resistome. Further advancements in sequencing technologies providing increased coverage depth or extended read lengths may improve ARG detection in agri-food metagenomic samples, enabling use of this approach for tracking clinically important ARGs in agri-food samples.
Collapse
Affiliation(s)
- Ashley L Cooper
- Research and Development, Ottawa Laboratory (Carling), Canadian Food Inspection Agency, Ottawa, ON, Canada
- Department of Biology, Carleton University, Ottawa, ON, Canada
| | - Andrew Low
- Research and Development, Ottawa Laboratory (Carling), Canadian Food Inspection Agency, Ottawa, ON, Canada
| | - Alex Wong
- Department of Biology, Carleton University, Ottawa, ON, Canada
| | - Sandeep Tamber
- Microbiology Research Division, Bureau of Microbial Hazards, Health Canada, Ottawa, ON, Canada
| | - Burton W Blais
- Research and Development, Ottawa Laboratory (Carling), Canadian Food Inspection Agency, Ottawa, ON, Canada
- Department of Biology, Carleton University, Ottawa, ON, Canada
| | - Catherine D Carrillo
- Research and Development, Ottawa Laboratory (Carling), Canadian Food Inspection Agency, Ottawa, ON, Canada.
- Department of Biology, Carleton University, Ottawa, ON, Canada.
| |
Collapse
|
6
|
Beavan A, Domingo-Sananes MR, McInerney JO. Contingency, repeatability, and predictability in the evolution of a prokaryotic pangenome. Proc Natl Acad Sci U S A 2024; 121:e2304934120. [PMID: 38147560 PMCID: PMC10769857 DOI: 10.1073/pnas.2304934120] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 11/05/2023] [Indexed: 12/28/2023] Open
Abstract
Pangenomes exhibit remarkable variability in many prokaryotic species, much of which is maintained through the processes of horizontal gene transfer and gene loss. Repeated acquisitions of near-identical homologs can easily be observed across pangenomes, leading to the question of whether these parallel events potentiate similar evolutionary trajectories, or whether the remarkably different genetic backgrounds of the recipients mean that postacquisition evolutionary trajectories end up being quite different. In this study, we present a machine learning method that predicts the presence or absence of genes in the Escherichia coli pangenome based on complex patterns of the presence or absence of other accessory genes within a genome. Our analysis leverages the repeated transfer of genes through the E. coli pangenome to observe patterns of repeated evolution following similar events. We find that the presence or absence of a substantial set of genes is highly predictable from other genes alone, indicating that selection potentiates and maintains gene-gene co-occurrence and avoidance relationships deterministically over long-term bacterial evolution and is robust to differences in host evolutionary history. We propose that at least part of the pangenome can be understood as a set of genes with relationships that govern their likely cohabitants, analogous to an ecosystem's set of interacting organisms. Our findings indicate that intragenomic gene fitness effects may be key drivers of prokaryotic evolution, influencing the repeated emergence of complex gene-gene relationships across the pangenome.
Collapse
Affiliation(s)
- Alan Beavan
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
| | - Maria Rosa Domingo-Sananes
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
- School of Science and Technology, Nottingham Trent University, NottinghamNG1 4FQ, United Kingdom
| | - James O. McInerney
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
| |
Collapse
|
7
|
Katriel G, Mahanaymi U, Brezner S, Kezel N, Koutschan C, Zeilberger D, Steel M, Snir S. Gene Transfer-Based Phylogenetics: Analytical Expressions and Additivity via Birth-Death Theory. Syst Biol 2023; 72:1403-1417. [PMID: 37862116 DOI: 10.1093/sysbio/syad060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 09/01/2023] [Accepted: 10/05/2023] [Indexed: 10/22/2023] Open
Abstract
The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.
Collapse
Affiliation(s)
- Guy Katriel
- Department of Mathematics, Braude College of Engineering, Karmiel, Israel
| | - Udi Mahanaymi
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Shelly Brezner
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Noor Kezel
- Department of Mathematics, University of Haifa, Haifa, Israel
| | | | - Doron Zeilberger
- Department of Mathematics, Rutgers University, New Brunwick, NJ, USA
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Sagi Snir
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
8
|
Karaś P, Kochanowicz K, Pitek M, Domanski P, Obuchowski I, Tomiczek B, Liberek K. Evolution towards simplicity in bacterial small heat shock protein system. eLife 2023; 12:RP89813. [PMID: 38063373 PMCID: PMC10708888 DOI: 10.7554/elife.89813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
Evolution can tinker with multi-protein machines and replace them with simpler single-protein systems performing equivalent functions in an equally efficient manner. It is unclear how, on a molecular level, such simplification can arise. With ancestral reconstruction and biochemical analysis, we have traced the evolution of bacterial small heat shock proteins (sHsp), which help to refold proteins from aggregates using either two proteins with different functions (IbpA and IbpB) or a secondarily single sHsp that performs both functions in an equally efficient way. Secondarily single sHsp evolved from IbpA, an ancestor specialized in strong substrate binding. Evolution of an intermolecular binding site drove the alteration of substrate binding properties, as well as the formation of higher-order oligomers. Upon two mutations in the α-crystallin domain, secondarily single sHsp interacts with aggregated substrates less tightly. Paradoxically, less efficient binding positively influences the ability of sHsp to stimulate substrate refolding, since the dissociation of sHps from aggregates is required to initiate Hsp70-Hsp100-dependent substrate refolding. After the loss of a partner, IbpA took over its role in facilitating the sHsp dissociation from an aggregate by weakening the interaction with the substrate, which became beneficial for the refolding process. We show that the same two amino acids introduced in modern-day systems define whether the IbpA acts as a single sHsp or obligatorily cooperates with an IbpB partner. Our discoveries illuminate how one sequence has evolved to encode functions previously performed by two distinct proteins.
Collapse
Affiliation(s)
- Piotr Karaś
- Intercollegiate Faculty of Biotechnology UG-MUG, University of GdanskGdańskPoland
| | - Klaudia Kochanowicz
- Intercollegiate Faculty of Biotechnology UG-MUG, University of GdanskGdańskPoland
| | - Marcin Pitek
- Intercollegiate Faculty of Biotechnology UG-MUG, University of GdanskGdańskPoland
| | - Przemyslaw Domanski
- Intercollegiate Faculty of Biotechnology UG-MUG, University of GdanskGdańskPoland
| | - Igor Obuchowski
- Intercollegiate Faculty of Biotechnology UG-MUG, University of GdanskGdańskPoland
| | - Barlomiej Tomiczek
- Intercollegiate Faculty of Biotechnology UG-MUG, University of GdanskGdańskPoland
| | - Krzysztof Liberek
- Intercollegiate Faculty of Biotechnology UG-MUG, University of GdanskGdańskPoland
| |
Collapse
|
9
|
Raimondeau P, Bianconi ME, Pereira L, Parisod C, Christin PA, Dunning LT. Lateral gene transfer generates accessory genes that accumulate at different rates within a grass lineage. THE NEW PHYTOLOGIST 2023; 240:2072-2084. [PMID: 37793435 DOI: 10.1111/nph.19272] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 08/30/2023] [Indexed: 10/06/2023]
Abstract
Lateral gene transfer (LGT) is the movement of DNA between organisms without sexual reproduction. The acquired genes represent genetic novelties that have independently evolved in the donor's genome. Phylogenetic methods have shown that LGT is widespread across the entire grass family, although we know little about the underlying dynamics. We identify laterally acquired genes in five de novo reference genomes from the same grass genus (four Alloteropsis semialata and one Alloteropsis angusta). Using additional resequencing data for a further 40 Alloteropsis individuals, we place the acquisition of each gene onto a phylogeny using stochastic character mapping, and then infer rates of gains and losses. We detect 168 laterally acquired genes in the five reference genomes (32-100 per genome). Exponential decay models indicate that the rate of LGT acquisitions (6-28 per Ma) and subsequent losses (11-24% per Ma) varied significantly among lineages. Laterally acquired genes were lost at a higher rate than vertically inherited loci (0.02-0.8% per Ma). This high turnover creates intraspecific gene content variation, with a preponderance of them occurring as accessory genes in the Alloteropsis pangenome. This rapid turnover generates standing variation that can ultimately fuel local adaptation.
Collapse
Affiliation(s)
- Pauline Raimondeau
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
- Laboratoire Evolution et Diversité Biologique, UMR5174, CNRS/IRD/Université Toulouse 3, Toulouse, 31062, France
| | - Matheus E Bianconi
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
| | - Lara Pereira
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
| | - Christian Parisod
- Department of Biology, University of Fribourg, Chemin du Musée 10, Fribourg, 1700, Switzerland
| | - Pascal-Antoine Christin
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
- Department of Biology, University of Fribourg, Chemin du Musée 10, Fribourg, 1700, Switzerland
| | - Luke T Dunning
- Ecology and Evolutionary Biology, School of Biosciences, University of Sheffield, Western Bank, Sheffield, S10 2TN, UK
| |
Collapse
|
10
|
Altae-Tran H, Shmakov SA, Makarova KS, Wolf YI, Kannan S, Zhang F, Koonin EV. Diversity, evolution, and classification of the RNA-guided nucleases TnpB and Cas12. Proc Natl Acad Sci U S A 2023; 120:e2308224120. [PMID: 37983496 PMCID: PMC10691335 DOI: 10.1073/pnas.2308224120] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/19/2023] [Indexed: 11/22/2023] Open
Abstract
The TnpB proteins are transposon-associated RNA-guided nucleases that are among the most abundant proteins encoded in bacterial and archaeal genomes, but whose functions in the transposon life cycle remain unknown. TnpB appears to be the evolutionary ancestor of Cas12, the effector nuclease of type V CRISPR-Cas systems. We performed a comprehensive census of TnpBs in archaeal and bacterial genomes and constructed a phylogenetic tree on which we mapped various features of these proteins. In multiple branches of the tree, the catalytic site of the TnpB nuclease is rearranged, demonstrating structural and probably biochemical malleability of this enzyme. We identified numerous cases of apparent recruitment of TnpB for other functions of which the most common is the evolution of type V CRISPR-Cas effectors on about 50 independent occasions. In many other cases of more radical exaptation, the catalytic site of the TnpB nuclease is apparently inactivated, suggesting a regulatory function, whereas in others, the activity appears to be retained, indicating that the recruited TnpB functions as a nuclease, for example, as a toxin. These findings demonstrate remarkable evolutionary malleability of the TnpB scaffold and provide extensive opportunities for further exploration of RNA-guided biological systems as well as multiple applications.
Collapse
Affiliation(s)
- Han Altae-Tran
- HHMI, Cambridge, MA02139
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Sergey A. Shmakov
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD20894
| | - Kira S. Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD20894
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD20894
| | - Soumya Kannan
- HHMI, Cambridge, MA02139
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Feng Zhang
- HHMI, Cambridge, MA02139
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA02142
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD20894
| |
Collapse
|
11
|
Manzano-Morales S, Liu Y, González-Bodí S, Huerta-Cepas J, Iranzo J. Comparison of gene clustering criteria reveals intrinsic uncertainty in pangenome analyses. Genome Biol 2023; 24:250. [PMID: 37904249 PMCID: PMC10614367 DOI: 10.1186/s13059-023-03089-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND A key step for comparative genomics is to group open reading frames into functionally and evolutionarily meaningful gene clusters. Gene clustering is complicated by intraspecific duplications and horizontal gene transfers that are frequent in prokaryotes. In consequence, gene clustering methods must deal with a trade-off between identifying vertically transmitted representatives of multicopy gene families, which are recognizable by synteny conservation, and retrieving complete sets of species-level orthologs. We studied the implications of adopting homology, orthology, or synteny conservation as formal criteria for gene clustering by performing comparative analyses of 125 prokaryotic pangenomes. RESULTS Clustering criteria affect pangenome functional characterization, core genome inference, and reconstruction of ancestral gene content to different extents. Species-wise estimates of pangenome and core genome sizes change by the same factor when using different clustering criteria, allowing robust cross-species comparisons regardless of the clustering criterion. However, cross-species comparisons of genome plasticity and functional profiles are substantially affected by inconsistencies among clustering criteria. Such inconsistencies are driven not only by mobile genetic elements, but also by genes involved in defense, secondary metabolism, and other accessory functions. In some pangenome features, the variability attributed to methodological inconsistencies can even exceed the effect sizes of ecological and phylogenetic variables. CONCLUSIONS Choosing an appropriate criterion for gene clustering is critical to conduct unbiased pangenome analyses. We provide practical guidelines to choose the right method depending on the research goals and the quality of genome assemblies, and a benchmarking dataset to assess the robustness and reproducibility of future comparative studies.
Collapse
Affiliation(s)
- Saioa Manzano-Morales
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Barcelona Supercomputing Centre (BSC-CNS) - Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Yang Liu
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou, China
| | - Sara González-Bodí
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
| | - Jaime Iranzo
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Madrid, Spain.
- Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza, Spain.
| |
Collapse
|
12
|
van Dijk B, Buffard P, Farr AD, Giersdorf F, Meijer J, Dutilh BE, Rainey PB. Identifying and tracking mobile elements in evolving compost communities yields insights into the nanobiome. ISME COMMUNICATIONS 2023; 3:90. [PMID: 37640834 PMCID: PMC10462680 DOI: 10.1038/s43705-023-00294-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 08/02/2023] [Accepted: 08/08/2023] [Indexed: 08/31/2023]
Abstract
Microbial evolution is driven by rapid changes in gene content mediated by horizontal gene transfer (HGT). While mobile genetic elements (MGEs) are important drivers of gene flux, the nanobiome-the zoo of Darwinian replicators that depend on microbial hosts-remains poorly characterised. New approaches are necessary to increase our understanding beyond MGEs shaping individual populations, towards their impacts on complex microbial communities. A bioinformatic pipeline (xenoseq) was developed to cross-compare metagenomic samples from microbial consortia evolving in parallel, aimed at identifying MGE dissemination, which was applied to compost communities which underwent periodic mixing of MGEs. We show that xenoseq can distinguish movement of MGEs from demographic changes in community composition that otherwise confounds identification, and furthermore demonstrate the discovery of various unexpected entities. Of particular interest was a nanobacterium of the candidate phylum radiation (CPR) which is closely related to a species identified in groundwater ecosystems (Candidatus Saccharibacterium), and appears to have a parasitic lifestyle. We also highlight another prolific mobile element, a 313 kb plasmid hosted by a Cellvibrio lineage. The host was predicted to be capable of nitrogen fixation, and acquisition of the plasmid coincides with increased ammonia production. Taken together, our data show that new experimental strategies combined with bioinformatic analyses of metagenomic data stand to provide insight into the nanobiome as a driver of microbial community evolution.
Collapse
Affiliation(s)
- Bram van Dijk
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany.
- Theoretical Biology and Bioinformatics, Department of Biology, Science for Life, Utrecht University, Utrecht, the Netherlands.
| | - Pauline Buffard
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Andrew D Farr
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Franz Giersdorf
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Jeroen Meijer
- Theoretical Biology and Bioinformatics, Department of Biology, Science for Life, Utrecht University, Utrecht, the Netherlands
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Department of Biology, Science for Life, Utrecht University, Utrecht, the Netherlands
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University, Jena, Germany
| | - Paul B Rainey
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, Germany.
- Laboratory of Biophysics and Evolution, CBI, ESPCI Paris, Université PSL CNRS, Paris, France.
| |
Collapse
|
13
|
O'Donnell S, Yue JX, Saada OA, Agier N, Caradec C, Cokelaer T, De Chiara M, Delmas S, Dutreux F, Fournier T, Friedrich A, Kornobis E, Li J, Miao Z, Tattini L, Schacherer J, Liti G, Fischer G. Telomere-to-telomere assemblies of 142 strains characterize the genome structural landscape in Saccharomyces cerevisiae. Nat Genet 2023; 55:1390-1399. [PMID: 37524789 PMCID: PMC10412453 DOI: 10.1038/s41588-023-01459-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 06/26/2023] [Indexed: 08/02/2023]
Abstract
Pangenomes provide access to an accurate representation of the genetic diversity of species, both in terms of sequence polymorphisms and structural variants (SVs). Here we generated the Saccharomyces cerevisiae Reference Assembly Panel (ScRAP) comprising reference-quality genomes for 142 strains representing the species' phylogenetic and ecological diversity. The ScRAP includes phased haplotype assemblies for several heterozygous diploid and polyploid isolates. We identified circa (ca.) 4,800 nonredundant SVs that provide a broad view of the genomic diversity, including the dynamics of telomere length and transposable elements. We uncovered frequent cases of complex aneuploidies where large chromosomes underwent large deletions and translocations. We found that SVs can impact gene expression near the breakpoints and substantially contribute to gene repertoire evolution. We also discovered that horizontally acquired regions insert at chromosome ends and can generate new telomeres. Overall, the ScRAP demonstrates the benefit of a pangenome in understanding genome evolution at population scale.
Collapse
Affiliation(s)
- Samuel O'Donnell
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Jia-Xing Yue
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Omar Abou Saada
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Nicolas Agier
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Claudia Caradec
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Thomas Cokelaer
- Biomics Technological Platform, Center for Technological Resources and Research (C2RT), Institut Pasteur, Paris, France
- Bioinformatics and Biostatistics Hub, Computational Biology Department, Institut Pasteur, Paris, France
| | | | - Stéphane Delmas
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France
| | - Fabien Dutreux
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Téo Fournier
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Etienne Kornobis
- Biomics Technological Platform, Center for Technological Resources and Research (C2RT), Institut Pasteur, Paris, France
- Bioinformatics and Biostatistics Hub, Computational Biology Department, Institut Pasteur, Paris, France
| | - Jing Li
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France
| | - Zepu Miao
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Sun Yat-sen University Cancer Center, Guangzhou, China
| | | | | | - Gianni Liti
- Université Côte d'Azur, CNRS, INSERM, IRCAN, Nice, France.
| | - Gilles Fischer
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, Paris, France.
| |
Collapse
|
14
|
Singh RP, Kumari K, Sharma PK, Ma Y. Characterization and in-depth genome analysis of a halotolerant probiotic bacterium Paenibacillus sp. S-12, a multifarious bacterium isolated from Rauvolfia serpentina. BMC Microbiol 2023; 23:192. [PMID: 37464310 PMCID: PMC10353221 DOI: 10.1186/s12866-023-02939-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 07/10/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND Members of Paenibacillus genus from diverse habitats have attracted great attention due to their multifarious properties. Considering that members of this genus are mostly free-living in soil, we characterized the genome of a halotolerant environmental isolate belonging to the genus Paenibacillus. The genome mining unravelled the presence of CAZymes, probiotic, and stress-protected genes that suggested strain S-12 for industrial and agricultural purposes. RESULTS Molecular identification by 16 S rRNA gene sequencing showed its closest match to other Paenibacillus species. The complete genome size of S-12 was 5.69 Mb, with a GC-content 46.5%. The genome analysis of S-12 unravelled the presence of an open reading frame (ORF) encoding the functions related to environmental stress tolerance, adhesion processes, multidrug efflux systems, and heavy metal resistance. Genome annotation identified the various genes for chemotaxis, flagellar motility, and biofilm production, illustrating its strong colonization ability. CONCLUSION The current findings provides the in-depth investigation of a probiotic Paenibacillus bacterium that possessed various genome features that enable the bacterium to survive under diverse conditions. The strain shows the strong ability for probiotic application purposes.
Collapse
Affiliation(s)
- Rajnish Prakash Singh
- Department of Biotechnology, Jaypee Institute of Information Technology, Noida, India.
| | - Kiran Kumari
- Department of Bioengineering and Biotechnology, Birla Institute of Technology, Mesra, Ranchi, India
| | - Parva Kumar Sharma
- Department of Plant Sciences and Landscape Architecture, University of Maryland, College Park, MD-20742, USA
| | - Ying Ma
- College of Resources and Environment, Southwest University, Chongqing, China
| |
Collapse
|
15
|
Jurdzinski KT, Mehrshad M, Delgado LF, Deng Z, Bertilsson S, Andersson AF. Large-scale phylogenomics of aquatic bacteria reveal molecular mechanisms for adaptation to salinity. SCIENCE ADVANCES 2023; 9:eadg2059. [PMID: 37235649 PMCID: PMC10219603 DOI: 10.1126/sciadv.adg2059] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Accepted: 04/21/2023] [Indexed: 05/28/2023]
Abstract
The crossing of environmental barriers poses major adaptive challenges. Rareness of freshwater-marine transitions separates the bacterial communities, but how these are related to brackish counterparts remains elusive, as do the molecular adaptations facilitating cross-biome transitions. We conducted large-scale phylogenomic analysis of freshwater, brackish, and marine quality-filtered metagenome-assembled genomes (11,248). Average nucleotide identity analyses showed that bacterial species rarely existed in multiple biomes. In contrast, distinct brackish basins cohosted numerous species, but their intraspecific population structures displayed clear signs of geographic separation. We further identified the most recent cross-biome transitions, which were rare, ancient, and most commonly directed toward the brackish biome. Transitions were accompanied by systematic changes in amino acid composition and isoelectric point distributions of inferred proteomes, which evolved over millions of years, as well as convergent gains or losses of specific gene functions. Therefore, adaptive challenges entailing proteome reorganization and specific changes in gene content constrains the cross-biome transitions, resulting in species-level separation between aquatic biomes.
Collapse
Affiliation(s)
- Krzysztof T. Jurdzinski
- Department of Gene Technology, KTH Royal Institute of Technology, Science for Life Laboratory, Stockholm, Sweden
| | - Maliheh Mehrshad
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Luis Fernando Delgado
- Department of Gene Technology, KTH Royal Institute of Technology, Science for Life Laboratory, Stockholm, Sweden
| | - Ziling Deng
- Department of Gene Technology, KTH Royal Institute of Technology, Science for Life Laboratory, Stockholm, Sweden
| | - Stefan Bertilsson
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Anders F. Andersson
- Department of Gene Technology, KTH Royal Institute of Technology, Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
16
|
Persi E, Wolf YI, Karamycheva S, Makarova KS, Koonin EV. Compensatory relationship between low-complexity regions and gene paralogy in the evolution of prokaryotes. Proc Natl Acad Sci U S A 2023; 120:e2300154120. [PMID: 37036997 PMCID: PMC10120016 DOI: 10.1073/pnas.2300154120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 03/17/2023] [Indexed: 04/12/2023] Open
Abstract
The evolution of genomes in all life forms involves two distinct, dynamic types of genomic changes: gene duplication (and loss) that shape families of paralogous genes and extension (and contraction) of low-complexity regions (LCR), which occurs through dynamics of short repeats in protein-coding genes. Although the roles of each of these types of events in genome evolution have been studied, their co-evolutionary dynamics is not thoroughly understood. Here, by analyzing a wide range of genomes from diverse bacteria and archaea, we show that LCR and paralogy represent two distinct routes of evolution that are inversely correlated. The emergence of LCR is a prominent evolutionary mechanism in fast evolving, young protein families, whereas paralogy dominates the comparatively slow evolution of old protein families. The analysis of multiple prokaryotic genomes shows that the formation of LCR is likely a widespread, transient evolutionary mechanism that temporally and locally affects also ancestral functions, but apparently, fades away with time, under mutational and selective pressures, yielding to gene paralogy. We propose that compensatory relationships between short-term and longer-term evolutionary mechanisms are universal in the evolution of life.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Svetlana Karamycheva
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Kira S. Makarova
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| |
Collapse
|
17
|
Rodríguez-Gijón A, Buck M, Andersson AF, Izabel-Shen D, Nascimento FJA, Garcia SL. Linking prokaryotic genome size variation to metabolic potential and environment. ISME COMMUNICATIONS 2023; 3:25. [PMID: 36973336 PMCID: PMC10042847 DOI: 10.1038/s43705-023-00231-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 03/02/2023] [Accepted: 03/14/2023] [Indexed: 03/29/2023]
Abstract
While theories and models have appeared to explain genome size as a result of evolutionary processes, little work has shown that genome sizes carry ecological signatures. Our work delves into the ecological implications of microbial genome size variation in benthic and pelagic habitats across environmental gradients of the brackish Baltic Sea. While depth is significantly associated with genome size in benthic and pelagic brackish metagenomes, salinity is only correlated to genome size in benthic metagenomes. Overall, we confirm that prokaryotic genome sizes in Baltic sediments (3.47 Mbp) are significantly bigger than in the water column (2.96 Mbp). While benthic genomes have a higher number of functions than pelagic genomes, the smallest genomes coded for a higher number of module steps per Mbp for most of the functions irrespective of their environment. Some examples of this functions are amino acid metabolism and central carbohydrate metabolism. However, we observed that nitrogen metabolism was almost absent in pelagic genomes and was mostly present in benthic genomes. Finally, we also show that Bacteria inhabiting Baltic sediments and water column not only differ in taxonomy, but also in their metabolic potential, such as the Wood-Ljungdahl pathway or the presence of different hydrogenases. Our work shows how microbial genome size is linked to abiotic factors in the environment, metabolic potential and taxonomic identity of Bacteria and Archaea within aquatic ecosystems.
Collapse
Affiliation(s)
- Alejandro Rodríguez-Gijón
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm, 106 91, Sweden.
- Science for Life Laboratory, Stockholm, Sweden.
| | - Moritz Buck
- Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Uppsala, Sweden
| | - Anders F Andersson
- Science for Life Laboratory, Stockholm, Sweden
- Department of Gene Technology, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Dandan Izabel-Shen
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm, 106 91, Sweden
| | - Francisco J A Nascimento
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm, 106 91, Sweden
- Baltic Sea Centre, Stockholm University, Stockholm, Sweden
| | - Sarahi L Garcia
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm, 106 91, Sweden.
- Science for Life Laboratory, Stockholm, Sweden.
| |
Collapse
|
18
|
Weltzer ML, Wall D. Social Diversification Driven by Mobile Genetic Elements. Genes (Basel) 2023; 14:648. [PMID: 36980919 PMCID: PMC10047993 DOI: 10.3390/genes14030648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 02/17/2023] [Accepted: 02/28/2023] [Indexed: 03/08/2023] Open
Abstract
Social diversification in microbes is an evolutionary process where lineages bifurcate into distinct populations that cooperate with themselves but not with other groups. In bacteria, this is frequently driven by horizontal transfer of mobile genetic elements (MGEs). Here, the resulting acquisition of new genes changes the recipient's social traits and consequently how they interact with kin. These changes include discriminating behaviors mediated by newly acquired effectors. Since the producing cell is protected by cognate immunity factors, these selfish elements benefit from selective discrimination against recent ancestors, thus facilitating their proliferation and benefiting the host. Whether social diversification benefits the population at large is less obvious. The widespread use of next-generation sequencing has recently provided new insights into population dynamics in natural habitats and the roles MGEs play. MGEs belong to accessory genomes, which often constitute the majority of the pangenome of a taxon, and contain most of the kin-discriminating loci that fuel rapid social diversification. We further discuss mechanisms of diversification and its consequences to populations and conclude with a case study involving myxobacteria.
Collapse
Affiliation(s)
- Michael L Weltzer
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA
| | - Daniel Wall
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA
| |
Collapse
|
19
|
Wang Y, Cai X, Hu S, Qin S, Wang Z, Cao Y, Hou C, Yang J, Zhou W. Comparative genomic analysis provides insight into the phylogeny and potential mechanisms of adaptive evolution of Sphingobacterium sp. CZ-2. Gene 2023; 855:147118. [PMID: 36521669 DOI: 10.1016/j.gene.2022.147118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/21/2022] [Accepted: 12/09/2022] [Indexed: 12/14/2022]
Abstract
Sphingobacterium is a class of Gram-negative, non-fermentative bacilli that have received widespread attention due to their broad ecological distribution and oil degradation ability, but are rarely involved in infections. In this manuscript, a novel Sphingobacterium strain isolated from wildfire-infected tobacco leaves was named Sphingobacterium sp. CZ-2. NGS and TGS sequencing results showed a whole genome of 3.92 Mb with 40.68 mol% GC content and containing 3,462 protein-coding genes, 9 rRNA-coding genes and 50 tRNA-coding genes. Phylogenetic analysis, ANI and dDDH calculations all supported that Sphingobacterium sp. CZ-2 represented a novel species of the genus Sphingobacterium. Analysis of the specific genes of Sphingobacterium sp. CZ-2 by comparative genomics revealed that metal transport proteins encoded by the troD and cusA genes could maintain the balance of heavy metal ion concentrations in the internal environment of bacteria and avoid heavy metal toxicity while meeting the needs of growth and reproduction, and transport proteins encoded by the malG gene could keep nutrients required for the survival of bacteria. Synteny and genome evolutionary analyses of Sphingobacterium strains implicated that the gene family contraction as a major process in genome evolution, with insertional sequences leading to mutations, deletions and reversals of genes that help bacteria to withstand complex environmental changes. Complete genome sequencing and systematic comparative genomic analysis will contribute new insights into the adaptive evolution of this novel species and the genus Sphingobacterium.
Collapse
Affiliation(s)
- Yongqiang Wang
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Xunhui Cai
- School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Shengnan Hu
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Sidong Qin
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Ziqi Wang
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Yixiang Cao
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Chaoliang Hou
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Jiangshan Yang
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
| | - Wei Zhou
- Hunan Provincial Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-Making, Hunan Agricultural University, Changsha 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China.
| |
Collapse
|
20
|
Madi N, Chen D, Wolff R, Shapiro BJ, Garud NR. Community diversity is associated with intra-species genetic diversity and gene loss in the human gut microbiome. eLife 2023; 12:e78530. [PMID: 36757364 PMCID: PMC9977275 DOI: 10.7554/elife.78530] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 02/08/2023] [Indexed: 02/10/2023] Open
Abstract
How the ecological process of community assembly interacts with intra-species diversity and evolutionary change is a longstanding question. Two contrasting hypotheses have been proposed: Diversity Begets Diversity (DBD), in which taxa tend to become more diverse in already diverse communities, and Ecological Controls (EC), in which higher community diversity impedes diversification. Previously, using 16S rRNA gene amplicon data across a range of microbiomes, we showed a generally positive relationship between taxa diversity and community diversity at higher taxonomic levels, consistent with the predictions of DBD (Madi et al., 2020). However, this positive 'diversity slope' plateaus at high levels of community diversity. Here we show that this general pattern holds at much finer genetic resolution, by analyzing intra-species strain and nucleotide variation in static and temporally sampled metagenomes from the human gut microbiome. Consistent with DBD, both intra-species polymorphism and strain number were positively correlated with community Shannon diversity. Shannon diversity is also predictive of increases in polymorphism over time scales up to ~4-6 months, after which the diversity slope flattens and becomes negative - consistent with DBD eventually giving way to EC. Finally, we show that higher community diversity predicts gene loss at a future time point. This observation is broadly consistent with the Black Queen Hypothesis, which posits that genes with functions provided by the community are less likely to be retained in a focal species' genome. Together, our results show that a mixture of DBD, EC, and Black Queen may operate simultaneously in the human gut microbiome, adding to a growing body of evidence that these eco-evolutionary processes are key drivers of biodiversity and ecosystem function.
Collapse
Affiliation(s)
- Naïma Madi
- Département de sciences biologiques, Université de MontréalMontréalCanada
| | - Daisy Chen
- Computational and Systems Biology, University of California, Los AngelesLos AngelesUnited States
- Bioinformatics and Systems Biology Program, University of California, San DiegoSan DiegoUnited States
| | - Richard Wolff
- Department of Ecology and Evolutionary Biology, University of California, Los AngelesLos AngelesUnited States
| | - B Jesse Shapiro
- Département de sciences biologiques, Université de MontréalMontréalCanada
- McGill Genome Centre, McGill UniversityMontrealCanada
- Quebec Centre for Biodiversity ScienceMontrealCanada
- McGill Centre for Microbiome ResearchMontrealCanada
- Department of Microbiology and Immunology, McGill UniversityMontrealCanada
| | - Nandita R Garud
- Department of Ecology and Evolutionary Biology, University of California, Los AngelesLos AngelesUnited States
- Department of Human Genetics, University of California, Los AngelesLos AngelesUnited States
| |
Collapse
|
21
|
Cerbino GN, Traglia GM, Ayala Nuñez T, Parmeciano Di Noto G, Ramírez MS, Centrón D, Iriarte A, Quiroga C. Comparative genome analysis of the genus Shewanella unravels the association of key genetic traits with known and potential pathogenic lineages. Front Microbiol 2023; 14:1124225. [PMID: 36925471 PMCID: PMC10011109 DOI: 10.3389/fmicb.2023.1124225] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 02/06/2023] [Indexed: 03/06/2023] Open
Abstract
Shewanella spp. are Gram-negative rods widely disseminated in aquatic niches that can also be found in human-associated environments. In recent years, reports of infections caused by these bacteria have increased significantly. Mobilome and resistome analysis of a few species showed that they are versatile; however, comprehensive comparative studies in the genus are lacking. Here, we analyzed the genetic traits of 144 genomes from Shewanella spp. isolates focusing on the mobilome, resistome, and virulome to establish their evolutionary relationship and detect unique features based on their genome content and habitat. Shewanella spp. showed a great diversity of mobile genetic elements (MGEs), most of them associated with monophyletic lineages of clinical isolates. Furthermore, 79/144 genomes encoded at least one antimicrobial resistant gene with their highest occurrence in clinical-related lineages. CRISPR-Cas systems, which confer immunity against MGEs, were found in 41 genomes being I-E and I-F the more frequent ones. Virulome analysis showed that all Shewanella spp. encoded different virulence genes (motility, quorum sensing, biofilm, adherence, etc.) that may confer adaptive advantages for survival against hosts. Our data revealed that key accessory genes are frequently found in two major clinical-related groups, which encompass the opportunistic pathogens Shewanella algae and Shewanella xiamenensis together with several other species. This work highlights the evolutionary nature of Shewanella spp. genomes, capable of acquiring different key genetic traits that contribute to their adaptation to different niches and facilitate the emergence of more resistant and virulent isolates that impact directly on human and animal health.
Collapse
Affiliation(s)
- Gabriela N Cerbino
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Tecnológicas, Instituto de Investigaciones en Microbiología y Parasitología Médica (IMPAM), Facultad de Medicina, Buenos Aires, Argentina
| | - German M Traglia
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Teolincacihuatl Ayala Nuñez
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Tecnológicas, Instituto de Investigaciones en Microbiología y Parasitología Médica (IMPAM), Facultad de Medicina, Buenos Aires, Argentina
| | - Gisela Parmeciano Di Noto
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Tecnológicas, Instituto de Investigaciones en Microbiología y Parasitología Médica (IMPAM), Facultad de Medicina, Buenos Aires, Argentina
| | - María Soledad Ramírez
- Center for Applied Biotechnology Studies, Department of Biological Science, California State University, Fullerton, Fullerton, CA, United States
| | - Daniela Centrón
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Tecnológicas, Instituto de Investigaciones en Microbiología y Parasitología Médica (IMPAM), Facultad de Medicina, Buenos Aires, Argentina
| | - Andrés Iriarte
- Laboratorio de Biología Computacional, Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Cecilia Quiroga
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Tecnológicas, Instituto de Investigaciones en Microbiología y Parasitología Médica (IMPAM), Facultad de Medicina, Buenos Aires, Argentina
| |
Collapse
|
22
|
Karamycheva S, Wolf YI, Persi E, Koonin EV, Makarova KS. Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions. Biol Direct 2022; 17:22. [PMID: 36042479 PMCID: PMC9425974 DOI: 10.1186/s13062-022-00337-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/13/2022] [Indexed: 12/24/2022] Open
Abstract
Background Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylogenetic analysis. Here we present an easy to estimate protein family level measure of sequence variability based on alignment column homogeneity in multiple alignments of protein sequences from Clade-Specific Clusters of Orthologous Genes (csCOGs). Results We report genome-wide estimates of variability for 8 diverse groups of bacteria and archaea and investigate the connection between variability and various genomic and biological features. The variability estimates are based on homogeneity distributions across amino acid sequence alignments and can be obtained for multiple groups of genomes at minimal computational expense. About half of the variance in variability values can be explained by the analyzed features, with the greatest contribution coming from the extent of gene paralogy in the given csCOG. The correlation between variability and paralogy appears to originate, primarily, not from gene duplication, but from acquisition of distant paralogs and xenologs, introducing sequence variants that are more divergent than those that could have evolved in situ during the lifetime of the given group of organisms. Both high-variability and low-variability csCOGs were identified in all functional categories, but as expected, proteins encoded by integrated mobile elements as well as proteins involved in defense functions and cell motility are, on average, more variable than proteins with housekeeping functions. Additionally, using linear discriminant analysis, we found that variability and fraction of genomes carrying a given gene are the two variables that provide the best prediction of gene essentiality as compared to the results of transposon mutagenesis in Sulfolobus islandicus. Conclusions Variability, a measure of sequence diversity within an alignment relative to the overall diversity within a group of organisms, offers a convenient proxy for evolutionary rate estimates and is informative with respect to prediction of functional properties of proteins. In particular, variability is a strong predictor of gene essentiality for the respective organisms and indicative of sub- or neofunctionalization of paralogs. Supplementary Information The online version contains supplementary material available at 10.1186/s13062-022-00337-7.
Collapse
Affiliation(s)
- Svetlana Karamycheva
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA.
| |
Collapse
|
23
|
Repeat sequences limit the effectiveness of lateral gene transfer and favored the evolution of meiotic sex in early eukaryotes. Proc Natl Acad Sci U S A 2022; 119:e2205041119. [PMID: 35994648 PMCID: PMC9436333 DOI: 10.1073/pnas.2205041119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The transition from prokaryotic lateral gene transfer to eukaryotic meiotic sex is poorly understood. Phylogenetic evidence suggests that it was tightly linked to eukaryogenesis, which involved an unprecedented rise in both genome size and the density of genetic repeats. Expansion of genome size raised the severity of Muller's ratchet, while limiting the effectiveness of lateral gene transfer (LGT) at purging deleterious mutations. In principle, an increase in recombination length combined with higher rates of LGT could solve this problem. Here, we show using a computational model that this solution fails in the presence of genetic repeats prevalent in early eukaryotes. The model demonstrates that dispersed repeat sequences allow ectopic recombination, which leads to the loss of genetic information and curtails the capacity of LGT to prevent mutation accumulation. Increasing recombination length in the presence of repeat sequences exacerbates the problem. Mutational decay can only be resisted with homology along extended sequences of DNA. We conclude that the transition to homologous pairing along linear chromosomes was a key innovation in meiotic sex, which was instrumental in the expansion of eukaryotic genomes and morphological complexity.
Collapse
|
24
|
Garza DR, von Meijenfeldt FAB, van Dijk B, Boleij A, Huynen MA, Dutilh BE. Nutrition or nature: using elementary flux modes to disentangle the complex forces shaping prokaryote pan-genomes. BMC Ecol Evol 2022; 22:101. [PMID: 35974327 PMCID: PMC9382767 DOI: 10.1186/s12862-022-02052-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 07/22/2022] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Microbial pan-genomes are shaped by a complex combination of stochastic and deterministic forces. Even closely related genomes exhibit extensive variation in their gene content. Understanding what drives this variation requires exploring the interactions of gene products with each other and with the organism's external environment. However, to date, conceptual models of pan-genome dynamics often represent genes as independent units and provide limited information about their mechanistic interactions. RESULTS We simulated the stochastic process of gene-loss using the pooled genome-scale metabolic reaction networks of 46 taxonomically diverse bacterial and archaeal families as proxies for their pan-genomes. The frequency by which reactions are retained in functional networks when stochastic gene loss is simulated in diverse environments allowed us to disentangle the metabolic reactions whose presence depends on the metabolite composition of the external environment (constrained by "nutrition") from those that are independent of the environment (constrained by "nature"). By comparing the frequency of reactions from the first group with their observed frequencies in bacterial and archaeal families, we predicted the metabolic niches that shaped the genomic composition of these lineages. Moreover, we found that the lineages that were shaped by a more diverse metabolic niche also occur in more diverse biomes as assessed by global environmental sequencing datasets. CONCLUSION We introduce a computational framework for analyzing and interpreting pan-reactomes that provides novel insights into the ecological and evolutionary drivers of pan-genome dynamics.
Collapse
Affiliation(s)
- Daniel R Garza
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands.
- Microbial Systems Biology, Laboratory of Molecular Bacteriology, Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven, Louvain, Belgium.
| | - F A Bastiaan von Meijenfeldt
- Department of Marine Microbiology and Biogeochemistry (MMB), NIOZ Royal Netherlands Institute for Sea Research, PO Box 59, 1790 AB, Den Burg, The Netherlands
| | - Bram van Dijk
- Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany
| | - Annemarie Boleij
- Department of Pathology, Radboud Institute for Molecular Life Sciences (RIMLS), Radboud University Medical Center, Geert Grooteplein-Zuid 10, 6525 GA, Nijmegen, The Netherlands
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands
| | - Bas E Dutilh
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, The Netherlands
- Institute of Biodiversity, Faculty of Biology, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University, Jena, Germany
| |
Collapse
|
25
|
Bremer N, Knopp M, Martin WF, Tria FDK. Realistic Gene Transfer to Gene Duplication Ratios Identify Different Roots in the Bacterial Phylogeny Using a Tree Reconciliation Method. LIFE (BASEL, SWITZERLAND) 2022; 12:life12070995. [PMID: 35888084 PMCID: PMC9322720 DOI: 10.3390/life12070995] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 06/29/2022] [Accepted: 07/01/2022] [Indexed: 11/29/2022]
Abstract
The rooting of phylogenetic trees permits important inferences about ancestral states and the polarity of evolutionary events. Recently, methods that reconcile discordance between gene-trees and species-trees—tree reconciliation methods—are becoming increasingly popular for rooting species trees. Rooting via reconciliation requires values for a particular parameter, the gene transfer to gene duplication ratio (T:D), which in current practice is estimated on the fly from discordances observed in the trees. To date, the accuracy of T:D estimates obtained by reconciliation analyses has not been compared to T:D estimates obtained by independent means, hence the effect of T:D upon inferences of species tree roots is altogether unexplored. Here we investigated the issue in detail by performing tree reconciliations of more than 10,000 gene trees under a variety of T:D ratios for two phylogenetic cases: a bacterial (prokaryotic) tree with 265 species and a fungal-metazoan (eukaryotic) tree with 31 species. We show that the T:D ratios automatically estimated by a current tree reconciliation method, ALE, generate virtually identical T:D ratios across bacterial genes and fungal-metazoan genes. The T:D ratios estimated by ALE differ 10- to 100-fold from robust, ALE-independent estimates from real data. More important is our finding that the root inferences using ALE in both datasets are strongly dependent upon T:D. Using more realistic T:D ratios, the number of roots inferred by ALE consistently increases and, in some cases, clearly incorrect roots are inferred. Furthermore, our analyses reveal that gene duplications have a far greater impact on ALE’s preferences for phylogenetic root placement than gene transfers or gene losses do. Overall, we show that obtaining reliable species tree roots with ALE is only possible when gene duplications are abundant in the data and the number of falsely inferred gene duplications is low. Finding a sufficient sample of true gene duplications for rooting species trees critically depends on the T:D ratios used in the analyses. T:D ratios, while being important parameters of genome evolution in their own right, affect the root inferences with tree reconciliations to an unanticipated degree.
Collapse
|
26
|
Abstract
We apply the theory of learning to physically renormalizable systems in an attempt to outline a theory of biological evolution, including the origin of life, as multilevel learning. We formulate seven fundamental principles of evolution that appear to be necessary and sufficient to render a universe observable and show that they entail the major features of biological evolution, including replication and natural selection. It is shown that these cornerstone phenomena of biology emerge from the fundamental features of learning dynamics such as the existence of a loss function, which is minimized during learning. We then sketch the theory of evolution using the mathematical framework of neural networks, which provides for detailed analysis of evolutionary phenomena. To demonstrate the potential of the proposed theoretical framework, we derive a generalized version of the Central Dogma of molecular biology by analyzing the flow of information during learning (back propagation) and predicting (forward propagation) the environment by evolving organisms. The more complex evolutionary phenomena, such as major transitions in evolution (in particular, the origin of life), have to be analyzed in the thermodynamic limit, which is described in detail in the paper by Vanchurin et al. [V. Vanchurin, Y. I. Wolf, E. V. Koonin, M. I. Katsnelson, Proc. Natl. Acad. Sci. U.S.A. 119, 10.1073/pnas.2120042119 (2022)].
Collapse
|
27
|
Cummins EA, Hall RJ, McInerney JO, McNally A. Prokaryote pangenomes are dynamic entities. Curr Opin Microbiol 2022; 66:73-78. [PMID: 35104691 DOI: 10.1016/j.mib.2022.01.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Revised: 01/07/2022] [Accepted: 01/11/2022] [Indexed: 11/24/2022]
Abstract
Prokaryote pangenomes are influenced heavily by environmental factors and the opportunity for gene gain and loss events. As the field of pangenome analysis has expanded, so has the need to fully understand the complexity of how eco-evolutionary dynamics shape pangenomes. Here, we describe current models of pangenome evolution and discuss their suitability and accuracy. We suggest that pangenomes are dynamic entities under constant flux, highlighting the influence of two-way interactions between pangenome and environment. New classifications of core and accessory genes are also considered, underscoring the need for continuous evaluation of nomenclature in a fast-moving field. We conclude that future models of pangenome evolution should incorporate eco-evolutionary dynamics to fully encompass their dynamic, changeable nature.
Collapse
Affiliation(s)
- Elizabeth A Cummins
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Rebecca J Hall
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK.
| | - James O McInerney
- School of Life Sciences, University of Nottingham, Nottingham, NG7 2UH, UK
| | - Alan McNally
- Institute of Microbiology and Infection, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK.
| |
Collapse
|
28
|
Cargo Genes of Tn 7-Like Transposons Comprise an Enormous Diversity of Defense Systems, Mobile Genetic Elements, and Antibiotic Resistance Genes. mBio 2021; 12:e0293821. [PMID: 34872347 PMCID: PMC8649781 DOI: 10.1128/mbio.02938-21] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Transposition is a major mechanism of horizontal gene mobility in prokaryotes. However, exploration of the genes mobilized by transposons (cargo) is hampered by the difficulty in delineating integrated transposons from their surrounding genetic context. Here, we present a computational approach that allowed us to identify the boundaries of 6,549 Tn7-like transposons. We found that 96% of these transposons carry at least one cargo gene. Delineation of distinct communities in a gene-sharing network demonstrates how transposons function as a conduit of genes between phylogenetically distant hosts. Comparative analysis of the cargo genes reveals significant enrichment of mobile genetic elements (MGEs) nested within Tn7-like transposons, such as insertion sequences and toxin-antitoxin modules, and of genes involved in recombination, anti-MGE defense, and antibiotic resistance. More unexpectedly, cargo also includes genes encoding central carbon metabolism enzymes. Twenty-two Tn7-like transposons carry both an anti-MGE defense system and antibiotic resistance genes, illustrating how bacteria can overcome these combined pressures upon acquisition of a single transposon. This work substantially expands the distribution of Tn7-like transposons, defines their evolutionary relationships, and provides a large-scale functional classification of prokaryotic genes mobilized by transposition.
Collapse
|
29
|
Li Y, Jiang B, Dai W. A large-scale whole-genome sequencing analysis reveals false positives of bacterial essential genes. Appl Microbiol Biotechnol 2021; 106:341-347. [PMID: 34889987 DOI: 10.1007/s00253-021-11702-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 11/05/2021] [Accepted: 11/15/2021] [Indexed: 11/26/2022]
Abstract
Essential genes are crucial for bacterial viability and represent attractive targets for novel anti-pathogen drug discovery. However, essential genes determined by the transposon insertion sequencing (Tn-seq) approach often contain many false positives. We hypothesized that some of those false positives are genes that are actually deleted from the genome, so they do not present any transposon insertion in the course of Tn-seq analysis. Based on this assumption, we performed a large-scale whole-genome sequencing analysis for the bacterium of interest. Our analysis revealed that some "essential genes" are indeed removed from the analyzed bacterial genomes. Since these genes were kicked out by bacteria, they should not be defined as essential. Our work showed that gene deletion is one of the false positive sources of essentiality determination, which is apparently underestimated in previous studies. We suggest subtracting the genome backgrounds before the evaluation of Tn-seq, and created a list of false positive gene essentiality as a reference for the downstream application. KEY POINTS: • Discovery of false positives of essential genes defined previously through the analyses of a large scale of whole-genome sequencing data • These false positives are the results of gene deletions in the studied genomes • Sequencing the target genome before Tn-seq analysis is of importance while some studies neglected it.
Collapse
Affiliation(s)
- Yuanhao Li
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, 510006, China
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Center, South China Agricultural University, Guangzhou, 510642, China
| | - Bo Jiang
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, 510006, China
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Center, South China Agricultural University, Guangzhou, 510642, China
| | - Weijun Dai
- Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou, 510006, China.
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Center, South China Agricultural University, Guangzhou, 510642, China.
| |
Collapse
|
30
|
Kloub L, Gosselin S, Fullmer M, Graf J, Gogarten JP, Bansal MS. Systematic Detection of Large-Scale Multigene Horizontal Transfer in Prokaryotes. Mol Biol Evol 2021; 38:2639-2659. [PMID: 33565580 PMCID: PMC8136488 DOI: 10.1093/molbev/msab043] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Horizontal gene transfer (HGT) is central to prokaryotic evolution. However, little is known about the “scale” of individual HGT events. In this work, we introduce the first computational framework to help answer the following fundamental question: How often does more than one gene get horizontally transferred in a single HGT event? Our method, called HoMer, uses phylogenetic reconciliation to infer single-gene HGT events across a given set of species/strains, employs several techniques to account for inference error and uncertainty, combines that information with gene order information from extant genomes, and uses statistical analysis to identify candidate horizontal multigene transfers (HMGTs) in both extant and ancestral species/strains. HoMer is highly scalable and can be easily used to infer HMGTs across hundreds of genomes. We apply HoMer to a genome-scale data set of over 22,000 gene families from 103 Aeromonas genomes and identify a large number of plausible HMGTs of various scales at both small and large phylogenetic distances. Analysis of these HMGTs reveals interesting relationships between gene function, phylogenetic distance, and frequency of multigene transfer. Among other insights, we find that 1) the observed relative frequency of HMGT increases as divergence between genomes increases, 2) HMGTs often have conserved gene functions, and 3) rare genes are frequently acquired through HMGT. We also analyze in detail HMGTs involving the zonula occludens toxin and type III secretion systems. By enabling the systematic inference of HMGTs on a large scale, HoMer will facilitate a more accurate and more complete understanding of HGT and microbial evolution.
Collapse
Affiliation(s)
- Lina Kloub
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA
| | - Sean Gosselin
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Matthew Fullmer
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA.,Bioinformatics Institute, School of Biological Sciences, The University of Auckland, Auckland, New Zealand
| | - Joerg Graf
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA.,The Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Johann Peter Gogarten
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA.,The Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.,The Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
31
|
Sheinman M, Arkhipova K, Arndt PF, Dutilh BE, Hermsen R, Massip F. Identical sequences found in distant genomes reveal frequent horizontal transfer across the bacterial domain. eLife 2021; 10:62719. [PMID: 34121661 PMCID: PMC8270642 DOI: 10.7554/elife.62719] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 06/13/2021] [Indexed: 12/19/2022] Open
Abstract
Horizontal gene transfer (HGT) is an essential force in microbial evolution. Despite detailed studies on a variety of systems, a global picture of HGT in the microbial world is still missing. Here, we exploit that HGT creates long identical DNA sequences in the genomes of distant species, which can be found efficiently using alignment-free methods. Our pairwise analysis of 93,481 bacterial genomes identified 138,273 HGT events. We developed a model to explain their statistical properties as well as estimate the transfer rate between pairs of taxa. This reveals that long-distance HGT is frequent: our results indicate that HGT between species from different phyla has occurred in at least 8% of the species. Finally, our results confirm that the function of sequences strongly impacts their transfer rate, which varies by more than three orders of magnitude between different functional categories. Overall, we provide a comprehensive view of HGT, illuminating a fundamental process driving bacterial evolution.
Collapse
Affiliation(s)
- Michael Sheinman
- Theoretical Biology and Bioinformatics, Biology Department, Utrecht University, Utrecht, Netherlands.,Division of Molecular Carcinogenesis, the Netherlands Cancer Institute, Amsterdam, Netherlands
| | - Ksenia Arkhipova
- Theoretical Biology and Bioinformatics, Biology Department, Utrecht University, Utrecht, Netherlands
| | - Peter F Arndt
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Biology Department, Utrecht University, Utrecht, Netherlands
| | - Rutger Hermsen
- Theoretical Biology and Bioinformatics, Biology Department, Utrecht University, Utrecht, Netherlands
| | - Florian Massip
- Berlin Institute for Medical Systems Biology, Max Delbrück Center, Berlin, Germany.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villleurbanne, France
| |
Collapse
|
32
|
Rainio MJ, Ruuskanen S, Helander M, Saikkonen K, Saloniemi I, Puigbò P. Adaptation of bacteria to glyphosate: a microevolutionary perspective of the enzyme 5-enolpyruvylshikimate-3-phosphate synthase. ENVIRONMENTAL MICROBIOLOGY REPORTS 2021; 13:309-316. [PMID: 33530134 DOI: 10.1111/1758-2229.12931] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 01/20/2021] [Indexed: 06/12/2023]
Abstract
Glyphosate is the leading herbicide worldwide, but it also affects prokaryotes because it targets the central enzyme (5-enolpyruvylshikimate-3-phosphate, EPSP) of the shikimate pathway in the synthesis of the three essential aromatic amino acids in bacteria, fungi and plants. Our results reveal that bacteria may easily become resistant to glyphosate through changes in the 5-enolpyruvylshikimate-3-phosphate synthase active site. This indicates the importance of examining how glyphosate affects microbe-mediated ecosystem functions and human microbiomes.
Collapse
Affiliation(s)
- Miia J Rainio
- Department of Biology, University of Turku, Turku, Finland
| | - Suvi Ruuskanen
- Department of Biology, University of Turku, Turku, Finland
| | - Marjo Helander
- Department of Biology, University of Turku, Turku, Finland
| | | | - Irma Saloniemi
- Department of Biology, University of Turku, Turku, Finland
| | - Pere Puigbò
- Department of Biology, University of Turku, Turku, Finland
- Nutrition and Health Unit, Eurecat Technology Centre of Catalonia, Reus, Catalonia, Spain
- Department of Biochemistry and Biotechnology, Rovira i Virgili University, Tarragona, Catalonia, Spain
| |
Collapse
|
33
|
Liu Y, Makarova KS, Huang WC, Wolf YI, Nikolskaya AN, Zhang X, Cai M, Zhang CJ, Xu W, Luo Z, Cheng L, Koonin EV, Li M. Expanded diversity of Asgard archaea and their relationships with eukaryotes. Nature 2021; 593:553-557. [PMID: 33911286 PMCID: PMC11165668 DOI: 10.1038/s41586-021-03494-3] [Citation(s) in RCA: 135] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Accepted: 03/26/2021] [Indexed: 01/21/2023]
Abstract
Asgard is a recently discovered superphylum of archaea that appears to include the closest archaeal relatives of eukaryotes1-5. Debate continues as to whether the archaeal ancestor of eukaryotes belongs within the Asgard superphylum or whether this ancestor is a sister group to all other archaea (that is, a two-domain versus a three-domain tree of life)6-8. Here we present a comparative analysis of 162 complete or nearly complete genomes of Asgard archaea, including 75 metagenome-assembled genomes that-to our knowledge-have not previously been reported. Our results substantially expand the phylogenetic diversity of Asgard and lead us to propose six additional phyla that include a deep branch that we have provisionally named Wukongarchaeota. Our phylogenomic analysis does not resolve unequivocally the evolutionary relationship between eukaryotes and Asgard archaea, but instead-depending on the choice of species and conserved genes used to build the phylogeny-supports either the origin of eukaryotes from within Asgard (as a sister group to the expanded Heimdallarchaeota-Wukongarchaeota branch) or a deeper branch for the eukaryote ancestor within archaea. Our comprehensive protein domain analysis using the 162 Asgard genomes results in a major expansion of the set of eukaryotic signature proteins. The Asgard eukaryotic signature proteins show variable phyletic distributions and domain architectures, which is suggestive of dynamic evolution through horizontal gene transfer, gene loss, gene duplication and domain shuffling. The phylogenomics of the Asgard archaea points to the accumulation of the components of the mobile archaeal 'eukaryome' in the archaeal ancestor of eukaryotes (within or outside Asgard) through extensive horizontal gene transfer.
Collapse
Affiliation(s)
- Yang Liu
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, P. R. China
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Wen-Cong Huang
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, P. R. China
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Anastasia N Nikolskaya
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Xinxu Zhang
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, P. R. China
| | - Mingwei Cai
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, P. R. China
| | - Cui-Jing Zhang
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, P. R. China
| | - Wei Xu
- Key Laboratory of Marine Biogenetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, P. R. China
| | - Zhuhua Luo
- Key Laboratory of Marine Biogenetic Resources, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, P. R. China
| | - Lei Cheng
- Key Laboratory of Development and Application of Rural Renewable Energy, Biogas Institute of Ministry of Agriculture, Chengdu, P. R. China
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | - Meng Li
- Shenzhen Key Laboratory of Marine Microbiome Engineering, Institute for Advanced Study, Shenzhen University, Shenzhen, P. R. China.
| |
Collapse
|
34
|
Allopatric Plant Pathogen Population Divergence following Disease Emergence. Appl Environ Microbiol 2021; 87:AEM.02095-20. [PMID: 33483307 DOI: 10.1128/aem.02095-20] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 01/13/2021] [Indexed: 12/19/2022] Open
Abstract
Within the landscape of globally distributed pathogens, populations differentiate via both adaptive and nonadaptive forces. Individual populations are likely to show unique trends of genetic diversity, host-pathogen interaction, and ecological adaptation. In plant pathogens, allopatric divergence may occur particularly rapidly within simplified agricultural monoculture landscapes. As such, the study of plant pathogen populations in monocultures can highlight the distinct evolutionary mechanisms that lead to local genetic differentiation. Xylella fastidiosa is a plant pathogen known to infect and damage multiple monocultures worldwide. One subspecies, Xylella fastidiosa subsp. fastidiosa, was first introduced to the United States ∼150 years ago, where it was found to infect and cause disease in grapevines (Pierce's disease of grapevines, or PD). Here, we studied PD-causing subsp. fastidiosa populations, with an emphasis on those found in the United States. Our study shows that following their establishment in the United States, PD-causing strains likely split into populations on the East and West Coasts. This diversification has occurred via both changes in gene content (gene gain/loss events) and variations in nucleotide sequence (mutation and recombination). In addition, we reinforce the notion that PD-causing populations within the United States acted as the source for subsequent subsp. fastidiosa outbreaks in Europe and Asia.IMPORTANCE Compared to natural environments, the reduced diversity of monoculture agricultural landscapes can lead bacterial plant pathogens to quickly adapt to local biological and ecological conditions. Because of this, accidental introductions of microbial pathogens into naive regions represents a significant economic and environmental threat. Xylella fastidiosa is a plant pathogen with an expanding host and geographic range due to multiple intra- and intercontinental introductions. X. fastidiosa subsp. fastidiosa infects and causes disease in grapevines (Pierce's disease of grapevines [PD]). This study focused on PD-causing X. fastidiosa populations, particularly those found in the United States but also invasions into Taiwan and Spain. The analysis shows that PD-causing X. fastidiosa has diversified via multiple cooccurring evolutionary forces acting at an intra- and interpopulation level. This analysis enables a better understanding of the mechanisms leading to the local adaptation of X. fastidiosa and how a plant pathogen diverges allopatrically after multiple and sequential introduction events.
Collapse
|
35
|
Han X, Guo J, Pang E, Song H, Lin K. Ab Initio Construction and Evolutionary Analysis of Protein-Coding Gene Families with Partially Homologous Relationships: Closely Related Drosophila Genomes as a Case Study. Genome Biol Evol 2021; 12:185-202. [PMID: 32108239 PMCID: PMC7144356 DOI: 10.1093/gbe/evaa041] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/18/2020] [Indexed: 01/05/2023] Open
Abstract
How have genes evolved within a well-known genome phylogeny? Many protein-coding genes should have evolved as a whole at the gene level, and some should have evolved partly through fragments at the subgene level. To comprehensively explore such complex homologous relationships and better understand gene family evolution, here, with de novo-identified modules, the subgene units which could consecutively cover proteins within a set of closely related species, we applied a new phylogeny-based approach that considers evolutionary models with partial homology to classify all protein-coding genes in nine Drosophila genomes. Compared with two other popular methods for gene family construction, our approach improved practical gene family classifications with a more reasonable view of homology and provided a much more complete landscape of gene family evolution at the gene and subgene levels. In the case study, we found that most expanded gene families might have evolved mainly through module rearrangements rather than gene duplications and mainly generated single-module genes through partial gene duplication, suggesting that there might be pervasive subgene rearrangement in the evolution of protein-coding gene families. The use of a phylogeny-based approach with partial homology to classify and analyze protein-coding gene families may provide us with a more comprehensive landscape depicting how genes evolve within a well-known genome phylogeny.
Collapse
Affiliation(s)
- Xia Han
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Jindan Guo
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Erli Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Hongtao Song
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| | - Kui Lin
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, China
| |
Collapse
|
36
|
Sela I, Wolf YI, Koonin EV. Assessment of assumptions underlying models of prokaryotic pangenome evolution. BMC Biol 2021; 19:27. [PMID: 33563283 PMCID: PMC7874442 DOI: 10.1186/s12915-021-00960-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 01/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The genomes of bacteria and archaea evolve by extensive loss and gain of genes which, for any group of related prokaryotic genomes, result in the formation of a pangenome with the universal, asymmetrical U-shaped distribution of gene commonality. However, the evolutionary factors that define the specific shape of this distribution are not thoroughly understood. RESULTS We investigate the fit of simple models of genome evolution to the empirically observed gene commonality distributions and genome intersections for 33 groups of closely related bacterial genomes. A model with an infinite external gene pool available for gene acquisition and constant genome size (IGP-CGS model), and two gene turnover rates, one for slow- and the other one for fast-evolving genes, allows two approaches to estimate the parameters for gene content dynamics. One is by fitting the model prediction to the distribution of the number of genes shared by precisely k genomes (gene commonality distribution) and another by analyzing the distribution of the number of genes common for k genome sets (k-cores). Both approaches produce a comparable overall quality of fit, although the former significantly overestimates the number of the universally conserved genes, while the latter overestimates the number of singletons. We further explore the effect of dropping each of the assumptions of the IGP-CGS model on the fit to the gene commonality distributions and show that models with either a finite gene pool or unequal rates of gene loss and gain (greater gene loss rate) eliminate the overestimate of the number of singletons or the core genome size. CONCLUSIONS We examine the assumptions that are usually adopted for modeling the evolution of the U-shaped gene commonality distributions in prokaryote genomes, namely, those of infinitely many genes and constant genome size. The combined analysis of genome intersections and gene commonality suggests that at least one of these assumptions is invalid. The violation of both these assumptions reflects the limited ability of prokaryotes to gain new genes. This limitation seems to stem, at least partly, from the horizontal gene transfer barrier, i.e., the cost of accommodation of foreign genes by prokaryotes. Further development of models taking into account the complexity of microbial evolution is necessary for an improved understanding of the evolution of prokaryotes.
Collapse
Affiliation(s)
- Itamar Sela
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
37
|
Koonin EV, Makarova KS, Wolf YI. Evolution of Microbial Genomics: Conceptual Shifts over a Quarter Century. Trends Microbiol 2021; 29:582-592. [PMID: 33541841 DOI: 10.1016/j.tim.2021.01.005] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 01/07/2021] [Accepted: 01/08/2021] [Indexed: 12/20/2022]
Abstract
Prokaryote genomics started in earnest in 1995, with the complete sequences of two small bacterial genomes, those of Haemophilus influenzae and Mycoplasma genitalium. During the next quarter century, the prokaryote genome database has been growing exponentially, with no saturation in sight. For most of these 25 years, genome sequencing remained limited to cultivable microbes. Together with next-generation sequencing methods, advances in metagenomics and single-cell genomics have lifted this limitation, providing for an increasingly unbiased characterization of the global prokaryote diversity. Advances in computational genomics followed the progress of genome sequencing, even if occasionally lagging behind. Several major new branches of bacteria and archaea were discovered, including Asgard archaea, the apparent closest relatives of eukaryotes and expansive groups of bacteria and archaea with small genomes thought to be symbionts of other prokaryotes. Comparative analysis of numerous prokaryote genomes spanning a wide range of evolutionary distances changed the conceptual foundations of microbiology, supplanting the notion of species genomes with fixed gene sets with that of dynamic pangenomes and the notion of a single Tree of Life (ToL) with a statistical tree-like trend among individual gene trees. Strides were also made towards a theory and quantitative laws of prokaryote genome evolution.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA.
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA
| |
Collapse
|
38
|
Domingo-Sananes MR, McInerney JO. Mechanisms That Shape Microbial Pangenomes. Trends Microbiol 2021; 29:493-503. [PMID: 33423895 DOI: 10.1016/j.tim.2020.12.004] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 12/09/2020] [Accepted: 12/10/2020] [Indexed: 01/02/2023]
Abstract
Analyses of multiple whole-genome sequences from the same species have revealed that differences in gene content can be substantial, particularly in prokaryotes. Such variation has led to the recognition of pangenomes, the complete set of genes present in a species - consisting of core genes, present in all individuals, and accessory genes whose presence is variable. Questions now arise about how pangenomes originate and evolve. We describe how gene content variation can arise as a result of the combination of several processes, including random drift, selection, gain/loss balance, and the influence of ecological and epistatic interactions. We believe that identifying the contributions of these processes to pangenomes will need novel theoretical approaches and empirical data.
Collapse
Affiliation(s)
- Maria Rosa Domingo-Sananes
- School of Life Sciences, University of Nottingham, Nottingham, UK; School of Science and Technology, Nottingham Trent University, Nottingham, UK.
| | | |
Collapse
|
39
|
Bonnici V, Maresi E, Giugno R. Challenges in gene-oriented approaches for pangenome content discovery. Brief Bioinform 2020; 22:5901976. [PMID: 32893299 DOI: 10.1093/bib/bbaa198] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 05/14/2020] [Accepted: 08/04/2020] [Indexed: 01/17/2023] Open
Abstract
Given a group of genomes, represented as the sets of genes that belong to them, the discovery of the pangenomic content is based on the search of genetic homology among the genes for clustering them into families. Thus, pangenomic analyses investigate the membership of the families to the given genomes. This approach is referred to as the gene-oriented approach in contrast to other definitions of the problem that takes into account different genomic features. In the past years, several tools have been developed to discover and analyse pangenomic contents. Because of the hardness of the problem, each tool applies a different strategy for discovering the pangenomic content. This results in a differentiation of the performance of each tool that depends on the composition of the input genomes. This review reports the main analysis instruments provided by the current state of the art tools for the discovery of pangenomic contents. Moreover, unlike previous works, the presented study compares pangenomic tools from a methodological perspective, analysing the causes that lead a given methodology to outperform other tools. The analysis is performed by taking into account different bacterial populations, which are synthetically generated by changing evolutionary parameters. The benchmarks used to compare the pangenomic tools, in addition to the computational pipeline developed for this purpose, are available at https://github.com/InfOmics/pangenes-review. Contact: V. Bonnici, R. Giugno Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
| | - Emiliano Maresi
- The Microsoft Research, University of Trento Centre for Computational and Systems Biology
| | - Rosalba Giugno
- Computer Science and Bioinformatics, referent of the Master Degree in Medical Bioinformatics
| |
Collapse
|
40
|
Flores-Bautista E, Hernandez-Guerrero R, Huerta-Saquero A, Tenorio-Salgado S, Rivera-Gomez N, Romero A, Ibarra JA, Perez-Rueda E. Deciphering the functional diversity of DNA-binding transcription factors in Bacteria and Archaea organisms. PLoS One 2020; 15:e0237135. [PMID: 32822422 PMCID: PMC7446807 DOI: 10.1371/journal.pone.0237135] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 07/20/2020] [Indexed: 11/18/2022] Open
Abstract
DNA-binding Transcription Factors (TFs) play a central role in regulation of gene expression in prokaryotic organisms, and similarities at the sequence level have been reported. These proteins are predicted with different abundances as a consequence of genome size, where small organisms contain a low proportion of TFs and large genomes contain a high proportion of TFs. In this work, we analyzed a collection of 668 experimentally validated TFs across 30 different species from diverse taxonomical classes, including Escherichia coli K-12, Bacillus subtilis 168, Corynebacterium glutamicum, and Streptomyces coelicolor, among others. This collection of TFs, together with 111 hidden Markov model profiles associated with DNA-binding TFs collected from diverse databases such as PFAM and DBD, was used to identify the repertoire of proteins putatively devoted to gene regulation in 1321 representative genomes of Archaea and Bacteria. The predicted regulatory proteins were posteriorly analyzed in terms of their genomic context, allowing the prediction of functions for TFs and their neighbor genes, such as genes involved in virulence, enzymatic functions, phosphorylation mechanisms, and antibiotic resistance. The functional analysis associated with PFAM groups showed diverse functional categories were significantly enriched in the collection of TFs and the proteins encoded by the neighbor genes, in particular, small-molecule binding and amino acid transmembrane transporter activities associated with the LysR family and proteins devoted to cellular aromatic compound metabolic processes or responses to drugs, stress, or abiotic stimuli in the MarR family. We consider that with the increasing data derived from new technologies, novel TFs can be identified and help improve the predictions for this class of proteins in complete genomes. The complete collection of experimentally characterized and predicted TFs is available at http://web.pcyt.unam.mx/EntrafDB/.
Collapse
Affiliation(s)
- Emanuel Flores-Bautista
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, México
| | - Rafael Hernandez-Guerrero
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, México
| | - Alejandro Huerta-Saquero
- Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México, Ensenada, Baja California, México
| | - Silvia Tenorio-Salgado
- Tecnológico Nacional de México, Instituto Tecnológico de Mérida, Mérida, Yucatán, México
| | | | - Alba Romero
- Microbiota Host Interactions and Clostridia Research Group, Universidad Nacional Andrés Bello, Santiago de Chile, Chile
| | - Jose Antonio Ibarra
- Laboratorio de Genética Microbiana, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, México
| | - Ernesto Perez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, Mérida, Yucatán, México
- Centro de Genómica y Bioinformática, Facultad de Ciencias, Universidad Mayor, Santiago, Chile
- * E-mail:
| |
Collapse
|
41
|
Hall RJ, Whelan FJ, McInerney JO, Ou Y, Domingo-Sananes MR. Horizontal Gene Transfer as a Source of Conflict and Cooperation in Prokaryotes. Front Microbiol 2020; 11:1569. [PMID: 32849327 PMCID: PMC7396663 DOI: 10.3389/fmicb.2020.01569] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 06/17/2020] [Indexed: 02/01/2023] Open
Abstract
Horizontal gene transfer (HGT) is one of the most important processes in prokaryote evolution. The sharing of DNA can spread neutral or beneficial genes, as well as genetic parasites across populations and communities, creating a large proportion of the variability acted on by natural selection. Here, we highlight the role of HGT in enhancing the opportunities for conflict and cooperation within and between prokaryote genomes. We discuss how horizontally acquired genes can cooperate or conflict both with each other and with a recipient genome, resulting in signature patterns of gene co-occurrence, avoidance, and dependence. We then describe how interactions involving horizontally transferred genes may influence cooperation and conflict at higher levels (populations, communities, and symbioses). Finally, we consider the benefits and drawbacks of HGT for prokaryotes and its fundamental role in understanding conflict and cooperation from the gene-gene to the microbiome level.
Collapse
Affiliation(s)
- Rebecca J Hall
- School of Life Sciences, University of Nottingham, Nottingham, United Kingdom
| | - Fiona J Whelan
- School of Life Sciences, University of Nottingham, Nottingham, United Kingdom
| | - James O McInerney
- School of Life Sciences, University of Nottingham, Nottingham, United Kingdom.,Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, United Kingdom
| | - Yaqing Ou
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, United Kingdom
| | | |
Collapse
|
42
|
Abstract
The last universal cellular ancestor (LUCA) is the most recent population of organisms from which all cellular life on Earth descends. The reconstruction of the genome and phenotype of the LUCA is a major challenge in evolutionary biology. Given that all life forms are associated with viruses and/or other mobile genetic elements, there is no doubt that the LUCA was a host to viruses. Here, by projecting back in time using the extant distribution of viruses across the two primary domains of life, bacteria and archaea, and tracing the evolutionary histories of some key virus genes, we attempt a reconstruction of the LUCA virome. Even a conservative version of this reconstruction suggests a remarkably complex virome that already included the main groups of extant viruses of bacteria and archaea. We further present evidence of extensive virus evolution antedating the LUCA. The presence of a highly complex virome implies the substantial genomic and pan-genomic complexity of the LUCA itself.
Collapse
|
43
|
van Dijk B, Hogeweg P, Doekes HM, Takeuchi N. Slightly beneficial genes are retained by bacteria evolving DNA uptake despite selfish elements. eLife 2020; 9:e56801. [PMID: 32432548 PMCID: PMC7316506 DOI: 10.7554/elife.56801] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 05/15/2020] [Indexed: 12/11/2022] Open
Abstract
Horizontal gene transfer (HGT) and gene loss result in rapid changes in the gene content of bacteria. While HGT aids bacteria to adapt to new environments, it also carries risks such as selfish genetic elements (SGEs). Here, we use modelling to study how HGT of slightly beneficial genes impacts growth rates of bacterial populations, and if bacterial collectives can evolve to take up DNA despite selfish elements. We find four classes of slightly beneficial genes: indispensable, enrichable, rescuable, and unrescuable genes. Rescuable genes - genes with small fitness benefits that are lost from the population without HGT - can be collectively retained by a community that engages in costly HGT. While this 'gene-sharing' cannot evolve in well-mixed cultures, it does evolve in a spatial population like a biofilm. Despite enabling infection by harmful SGEs, the uptake of foreign DNA is evolutionarily maintained by the hosts, explaining the coexistence of bacteria and SGEs.
Collapse
Affiliation(s)
- Bram van Dijk
- Utrecht University, Theoretical BiologyUtrechtNetherlands
| | | | - Hilje M Doekes
- Utrecht University, Theoretical BiologyUtrechtNetherlands
| | - Nobuto Takeuchi
- University of Auckland, Biological SciencesAucklandNew Zealand
| |
Collapse
|
44
|
Castillo JA, Secaira-Morocho H, Maldonado S, Sarmiento KN. Diversity and Evolutionary Dynamics of Antiphage Defense Systems in Ralstonia solanacearum Species Complex. Front Microbiol 2020; 11:961. [PMID: 32508782 PMCID: PMC7251935 DOI: 10.3389/fmicb.2020.00961] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 04/22/2020] [Indexed: 12/20/2022] Open
Abstract
Over the years, many researchers have reported a great diversity of bacteriophages infecting members of the Ralstonia solanacearum species complex (RSSC). This diversity has driven bacterial evolution by leading the emergence and maintenance of bacterial defense systems to combat phage infection. In this work, we present an in silico study of the arsenal of defense systems that RSSC harbors and their evolutionary history. For this purpose, we used a combination of genomic, phylogenetic and associative methods. We found that in addition to the CRISPR-Cas system already reported, there are eight other antiphage defense systems including the well-known Restriction-Modification and Toxin-Antitoxin systems. Furthermore, we found a tenth defense system, which is dedicated to reducing the incidence of plasmid transformation in bacteria. We undertook an analysis of the gene gain and loss patterns of the defense systems in 15 genomes of RSSC. Results indicate that the dynamics are inclined toward the gain of defense genes as opposed to the rest of the genes that were preferably lost throughout evolution. This was confirmed by evidence on independent gene acquisition that has occurred by profuse horizontal transfer. The mutation and recombination rates were calculated as a proxy of evolutionary rates. Again, genes encoding the defense systems follow different rates of evolution respect to the rest of the genes. These results lead us to conclude that the evolution of RSSC defense systems is highly dynamic and responds to a different evolutionary regime than the rest of the genes in the genomes of RSSC.
Collapse
Affiliation(s)
- José A Castillo
- School of Biological Sciences and Engineering, Yachay Tech University, San Miguel de Urcuquí, Ecuador
| | - Henry Secaira-Morocho
- School of Biological Sciences and Engineering, Yachay Tech University, San Miguel de Urcuquí, Ecuador
| | - Stephanie Maldonado
- School of Biological Sciences and Engineering, Yachay Tech University, San Miguel de Urcuquí, Ecuador
| | - Katlheen N Sarmiento
- School of Biological Sciences and Engineering, Yachay Tech University, San Miguel de Urcuquí, Ecuador
| |
Collapse
|
45
|
Koonin EV, Dolja VV, Krupovic M, Varsani A, Wolf YI, Yutin N, Zerbini FM, Kuhn JH. Global Organization and Proposed Megataxonomy of the Virus World. Microbiol Mol Biol Rev 2020; 84:e00061-19. [PMID: 32132243 PMCID: PMC7062200 DOI: 10.1128/mmbr.00061-19] [Citation(s) in RCA: 324] [Impact Index Per Article: 81.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Viruses and mobile genetic elements are molecular parasites or symbionts that coevolve with nearly all forms of cellular life. The route of virus replication and protein expression is determined by the viral genome type. Comparison of these routes led to the classification of viruses into seven "Baltimore classes" (BCs) that define the major features of virus reproduction. However, recent phylogenomic studies identified multiple evolutionary connections among viruses within each of the BCs as well as between different classes. Due to the modular organization of virus genomes, these relationships defy simple representation as lines of descent but rather form complex networks. Phylogenetic analyses of virus hallmark genes combined with analyses of gene-sharing networks show that replication modules of five BCs (three classes of RNA viruses and two classes of reverse-transcribing viruses) evolved from a common ancestor that encoded an RNA-directed RNA polymerase or a reverse transcriptase. Bona fide viruses evolved from this ancestor on multiple, independent occasions via the recruitment of distinct cellular proteins as capsid subunits and other structural components of virions. The single-stranded DNA (ssDNA) viruses are a polyphyletic class, with different groups evolving by recombination between rolling-circle-replicating plasmids, which contributed the replication protein, and positive-sense RNA viruses, which contributed the capsid protein. The double-stranded DNA (dsDNA) viruses are distributed among several large monophyletic groups and arose via the combination of distinct structural modules with equally diverse replication modules. Phylogenomic analyses reveal the finer structure of evolutionary connections among RNA viruses and reverse-transcribing viruses, ssDNA viruses, and large subsets of dsDNA viruses. Taken together, these analyses allow us to outline the global organization of the virus world. Here, we describe the key aspects of this organization and propose a comprehensive hierarchical taxonomy of viruses.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Valerian V Dolja
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, USA
| | - Mart Krupovic
- Institut Pasteur, Archaeal Virology Unit, Department of Microbiology, Paris, France
| | - Arvind Varsani
- The Biodesign Center for Fundamental and Applied Microbiomics, Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, Arizona, USA
- Structural Biology Research Unit, Department of Clinical Laboratory Sciences, University of Cape Town, Observatory, Cape Town, South Africa
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Natalya Yutin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - F Murilo Zerbini
- Departamento de Fitopatologia/Bioagro, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
| | - Jens H Kuhn
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Frederick, Maryland, USA
| |
Collapse
|
46
|
Huang X, Albou LP, Mushayahama T, Muruganujan A, Tang H, Thomas PD. Ancestral Genomes: a resource for reconstructed ancestral genes and genomes across the tree of life. Nucleic Acids Res 2020; 47:D271-D279. [PMID: 30371900 PMCID: PMC6323951 DOI: 10.1093/nar/gky1009] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 10/10/2018] [Indexed: 11/23/2022] Open
Abstract
A growing number of whole genome sequencing projects, in combination with development of phylogenetic methods for reconstructing gene evolution, have provided us with a window into genomes that existed millions, and even billions, of years ago. Ancestral Genomes (http://ancestralgenomes.org) is a resource for comprehensive reconstructions of these ‘fossil genomes’. Comprehensive sets of protein-coding genes have been reconstructed for 78 genomes of now-extinct species that were the common ancestors of extant species from across the tree of life. The reconstructed genes are based on the extensive library of over 15 000 gene family trees from the PANTHER database, and are updated on a yearly basis. For each ancestral gene, we assign a stable identifier, and provide additional information designed to facilitate analysis: an inferred name, a reconstructed protein sequence, a set of inferred Gene Ontology (GO) annotations, and a ‘proxy gene’ for each ancestral gene, defined as the least-diverged descendant of the ancestral gene in a given extant genome. On the Ancestral Genomes website, users can browse the Ancestral Genomes by selecting nodes in a species tree, and can compare an extant genome with any of its reconstructed ancestors to understand how the genome evolved.
Collapse
Affiliation(s)
- Xiaosong Huang
- School of Life Sciences, Guangzhou University, Guangzhou 510006, China.,Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Laurent-Philippe Albou
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Tremayne Mushayahama
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Haiming Tang
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| |
Collapse
|
47
|
Sevillya G, Doerr D, Lerner Y, Stoye J, Steel M, Snir S. Horizontal Gene Transfer Phylogenetics: A Random Walk Approach. Mol Biol Evol 2020; 37:1470-1479. [PMID: 31845962 DOI: 10.1093/molbev/msz302] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The dramatic decrease in time and cost for generating genetic sequence data has opened up vast opportunities in molecular systematics, one of which is the ability to decipher the evolutionary history of strains of a species. Under this fine systematic resolution, the standard markers are too crude to provide a phylogenetic signal. Nevertheless, among prokaryotes, genome dynamics in the form of horizontal gene transfer (HGT) between organisms and gene loss seem to provide far richer information by affecting both gene order and gene content. The "synteny index" (SI) between a pair of genomes combines these latter two factors, allowing comparison of genomes with unequal gene content, together with order considerations of their common genes. Although this approach is useful for classifying close relatives, no rigorous statistical modeling for it has been suggested. Such modeling is valuable, as it allows observed measures to be transformed into estimates of time periods during evolution, yielding the "additivity" of the measure. To the best of our knowledge, there is no other additivity proof for other gene order/content measures under HGT. Here, we provide a first statistical model and analysis for the SI measure. We model the "gene neighborhood" as a "birth-death-immigration" process affected by the HGT activity over the genome, and analytically relate the HGT rate and time to the expected SI. This model is asymptotic and thus provides accurate results, assuming infinite size genomes. Therefore, we also developed a heuristic model following an "exponential decay" function, accounting for biologically realistic values, which performed well in simulations. Applying this model to 1,133 prokaryotes partitioned to 39 clusters by the rank of genus yields that the average number of genome dynamics events per gene in the phylogenetic depth of genus is around half with significant variability between genera. This result extends and confirms similar results obtained for individual genera in different manners.
Collapse
Affiliation(s)
- Gur Sevillya
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Daniel Doerr
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Yael Lerner
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Jens Stoye
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
48
|
Maistrenko OM, Mende DR, Luetge M, Hildebrand F, Schmidt TSB, Li SS, Rodrigues JFM, von Mering C, Pedro Coelho L, Huerta-Cepas J, Sunagawa S, Bork P. Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity. THE ISME JOURNAL 2020; 14:1247-1259. [PMID: 32047279 PMCID: PMC7174425 DOI: 10.1038/s41396-020-0600-z] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 01/21/2020] [Accepted: 01/27/2020] [Indexed: 12/04/2022]
Abstract
Microbial organisms inhabit virtually all environments and encompass a vast biological diversity. The pangenome concept aims to facilitate an understanding of diversity within defined phylogenetic groups. Hence, pangenomes are increasingly used to characterize the strain diversity of prokaryotic species. To understand the interdependence of pangenome features (such as the number of core and accessory genes) and to study the impact of environmental and phylogenetic constraints on the evolution of conspecific strains, we computed pangenomes for 155 phylogenetically diverse species (from ten phyla) using 7,000 high-quality genomes to each of which the respective habitats were assigned. Species habitat ubiquity was associated with several pangenome features. In particular, core-genome size was more important for ubiquity than accessory genome size. In general, environmental preferences had a stronger impact on pangenome evolution than phylogenetic inertia. Environmental preferences explained up to 49% of the variance for pangenome features, compared with 18% by phylogenetic inertia. This observation was robust when the dataset was extended to 10,100 species (59 phyla). The importance of environmental preferences was further accentuated by convergent evolution of pangenome features in a given habitat type across different phylogenetic clades. For example, the soil environment promotes expansion of pangenome size, while host-associated habitats lead to its reduction. Taken together, we explored the global principles of pangenome evolution, quantified the influence of habitat, and phylogenetic inertia on the evolution of pangenomes and identified criteria governing species ubiquity and habitat specificity.
Collapse
Affiliation(s)
- Oleksandr M Maistrenko
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
| | - Daniel R Mende
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Laboratory of Applied Evolutionary Biology, Department of Medical Microbiology, Academic Medical Centre, University of Amsterdam, Amsterdam, 1105 AZ, The Netherlands
| | - Mechthild Luetge
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Institute of Immunobiology, Kantonsspital St. Gallen, 9007, St. Gallen, Switzerland
| | - Falk Hildebrand
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Gut Microbes and Health, Quadram Institute Bioscience, Norwich, Norfolk, UK
- Digital Biology, Earlham Institute, Norwich, Norfolk, UK
| | - Thomas S B Schmidt
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
| | - Simone S Li
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kongens Lyngby, Denmark
| | - João F Matias Rodrigues
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, CH-8057, Zurich, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, CH-8057, Zurich, Switzerland
| | - Luis Pedro Coelho
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433, China
| | - Jaime Huerta-Cepas
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain
| | - Shinichi Sunagawa
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany
- Department of Biology and Swiss Institute of Bioinformatics, ETH Zürich, Vladimir-Prelog-Weg 4, 8093, Zürich, Switzerland
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, 69117, Heidelberg, Germany.
- Max Delbrück Centre for Molecular Medicine, Berlin, Germany.
- Molecular Medicine Partnership Unit, University of Heidelberg and European Molecular Biology Laboratory, Heidelberg, Germany.
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
49
|
Galperin MY, Kristensen DM, Makarova KS, Wolf YI, Koonin EV. Microbial genome analysis: the COG approach. Brief Bioinform 2020; 20:1063-1070. [PMID: 28968633 DOI: 10.1093/bib/bbx117] [Citation(s) in RCA: 152] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 08/01/2017] [Indexed: 11/15/2022] Open
Abstract
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis.
Collapse
|
50
|
Zhang Y, Zhang Z, Zhang H, Zhao Y, Zhang Z, Xiao J. PADS Arsenal: a database of prokaryotic defense systems related genes. Nucleic Acids Res 2020; 48:D590-D598. [PMID: 31620779 PMCID: PMC7145686 DOI: 10.1093/nar/gkz916] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2019] [Revised: 10/03/2019] [Accepted: 10/04/2019] [Indexed: 12/16/2022] Open
Abstract
Defense systems are vital weapons for prokaryotes to resist heterologous DNA and survive from the constant invasion of viruses, and they are widely used in biochemistry investigation and antimicrobial drug research. So far, numerous types of defense systems have been discovered, but there is no comprehensive defense systems database to organize prokaryotic defense gene datasets. To fill this gap, we unveil the prokaryotic antiviral defense system (PADS) Arsenal (https://bigd.big.ac.cn/padsarsenal), a public database dedicated to gathering, storing, analyzing and visualizing prokaryotic defense gene datasets. The initial version of PADS Arsenal integrates 18 distinctive categories of defense system with the annotation of 6 600 264 genes retrieved from 63,701 genomes across 33 390 species of archaea and bacteria. PADS Arsenal provides various ways to retrieve defense systems related genes information and visualize them with multifarious function modes. Moreover, an online analysis pipeline is integrated into PADS Arsenal to facilitate annotation and evolutionary analysis of defense genes. PADS Arsenal can also visualize the dynamic variation information of defense genes from pan-genome analysis. Overall, PADS Arsenal is a state-of-the-art open comprehensive resource to accelerate the research of prokaryotic defense systems.
Collapse
Affiliation(s)
- Yadong Zhang
- National Genomics Data Center, Beijing 100101, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhewen Zhang
- National Genomics Data Center, Beijing 100101, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Hao Zhang
- National Genomics Data Center, Beijing 100101, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yongbing Zhao
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Zaichao Zhang
- Department of Biology, The University of Western Ontario, London, Ontario N6A 5B7, Canada
| | - Jingfa Xiao
- National Genomics Data Center, Beijing 100101, China
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|