1
|
Beavan A, Domingo-Sananes MR, McInerney JO. Contingency, repeatability, and predictability in the evolution of a prokaryotic pangenome. Proc Natl Acad Sci U S A 2024; 121:e2304934120. [PMID: 38147560 PMCID: PMC10769857 DOI: 10.1073/pnas.2304934120] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 11/05/2023] [Indexed: 12/28/2023] Open
Abstract
Pangenomes exhibit remarkable variability in many prokaryotic species, much of which is maintained through the processes of horizontal gene transfer and gene loss. Repeated acquisitions of near-identical homologs can easily be observed across pangenomes, leading to the question of whether these parallel events potentiate similar evolutionary trajectories, or whether the remarkably different genetic backgrounds of the recipients mean that postacquisition evolutionary trajectories end up being quite different. In this study, we present a machine learning method that predicts the presence or absence of genes in the Escherichia coli pangenome based on complex patterns of the presence or absence of other accessory genes within a genome. Our analysis leverages the repeated transfer of genes through the E. coli pangenome to observe patterns of repeated evolution following similar events. We find that the presence or absence of a substantial set of genes is highly predictable from other genes alone, indicating that selection potentiates and maintains gene-gene co-occurrence and avoidance relationships deterministically over long-term bacterial evolution and is robust to differences in host evolutionary history. We propose that at least part of the pangenome can be understood as a set of genes with relationships that govern their likely cohabitants, analogous to an ecosystem's set of interacting organisms. Our findings indicate that intragenomic gene fitness effects may be key drivers of prokaryotic evolution, influencing the repeated emergence of complex gene-gene relationships across the pangenome.
Collapse
Affiliation(s)
- Alan Beavan
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
| | - Maria Rosa Domingo-Sananes
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
- School of Science and Technology, Nottingham Trent University, NottinghamNG1 4FQ, United Kingdom
| | - James O. McInerney
- School of Life Sciences, The University of Nottingham, NottinghamNG7 2UH, United Kingdom
| |
Collapse
|
2
|
Funabiki H, Wassing IE, Jia Q, Luo JD, Carroll T. Coevolution of the CDCA7-HELLS ICF-related nucleosome remodeling complex and DNA methyltransferases. eLife 2023; 12:RP86721. [PMID: 37769127 PMCID: PMC10538959 DOI: 10.7554/elife.86721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/30/2023] Open
Abstract
5-Methylcytosine (5mC) and DNA methyltransferases (DNMTs) are broadly conserved in eukaryotes but are also frequently lost during evolution. The mammalian SNF2 family ATPase HELLS and its plant ortholog DDM1 are critical for maintaining 5mC. Mutations in HELLS, its activator CDCA7, and the de novo DNA methyltransferase DNMT3B, cause immunodeficiency-centromeric instability-facial anomalies (ICF) syndrome, a genetic disorder associated with the loss of DNA methylation. We here examine the coevolution of CDCA7, HELLS and DNMTs. While DNMT3, the maintenance DNA methyltransferase DNMT1, HELLS, and CDCA7 are all highly conserved in vertebrates and green plants, they are frequently co-lost in other evolutionary clades. The presence-absence patterns of these genes are not random; almost all CDCA7 harboring eukaryote species also have HELLS and DNMT1 (or another maintenance methyltransferase, DNMT5). Coevolution of presence-absence patterns (CoPAP) analysis in Ecdysozoa further indicates coevolutionary linkages among CDCA7, HELLS, DNMT1 and its activator UHRF1. We hypothesize that CDCA7 becomes dispensable in species that lost HELLS or DNA methylation, and/or the loss of CDCA7 triggers the replacement of DNA methylation by other chromatin regulation mechanisms. Our study suggests that a unique specialized role of CDCA7 in HELLS-dependent DNA methylation maintenance is broadly inherited from the last eukaryotic common ancestor.
Collapse
Affiliation(s)
- Hironori Funabiki
- Laboratory of Chromosome and Cell Biology, The Rockefeller UniversityNew YorkUnited States
| | - Isabel E Wassing
- Laboratory of Chromosome and Cell Biology, The Rockefeller UniversityNew YorkUnited States
| | - Qingyuan Jia
- Laboratory of Chromosome and Cell Biology, The Rockefeller UniversityNew YorkUnited States
| | - Ji-Dung Luo
- Bioinformatics Resource Center, The Rockefeller UniversityNew YorkUnited States
| | - Thomas Carroll
- Bioinformatics Resource Center, The Rockefeller UniversityNew YorkUnited States
| |
Collapse
|
3
|
Funabiki H, Wassing IE, Jia Q, Luo JD, Carroll T. Coevolution of the CDCA7-HELLS ICF-related nucleosome remodeling complex and DNA methyltransferases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.30.526367. [PMID: 36778482 PMCID: PMC9915587 DOI: 10.1101/2023.01.30.526367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
5-Methylcytosine (5mC) and DNA methyltransferases (DNMTs) are broadly conserved in eukaryotes but are also frequently lost during evolution. The mammalian SNF2 family ATPase HELLS and its plant ortholog DDM1 are critical for maintaining 5mC. Mutations in HELLS, its activator CDCA7, and the de novo DNA methyltransferase DNMT3B, cause immunodeficiency-centromeric instability-facial anomalies (ICF) syndrome, a genetic disorder associated with the loss of DNA methylation. We here examine the coevolution of CDCA7, HELLS and DNMTs. While DNMT3, the maintenance DNA methyltransferase DNMT1, HELLS, and CDCA7 are all highly conserved in vertebrates and green plants, they are frequently co-lost in other evolutionary clades. The presence-absence patterns of these genes are not random; almost all CDCA7 harboring eukaryote species also have HELLS and DNMT1 (or another maintenance methyltransferase, DNMT5). Coevolution of presence-absence patterns (CoPAP) analysis in Ecdysozoa further indicates coevolutionary linkages among CDCA7, HELLS, DNMT1 and its activator UHRF1. We hypothesize that CDCA7 becomes dispensable in species that lost HELLS or DNA methylation, and/or the loss of CDCA7 triggers the replacement of DNA methylation by other chromatin regulation mechanisms. Our study suggests that a unique specialized role of CDCA7 in HELLS-dependent DNA methylation maintenance is broadly inherited from the last eukaryotic common ancestor.
Collapse
Affiliation(s)
- Hironori Funabiki
- Laboratory of Chromosome and Cell Biology, The Rockefeller University, New York, NY 10065
| | - Isabel E. Wassing
- Laboratory of Chromosome and Cell Biology, The Rockefeller University, New York, NY 10065
| | - Qingyuan Jia
- Laboratory of Chromosome and Cell Biology, The Rockefeller University, New York, NY 10065
| | - Ji-Dung Luo
- Bioinformatics Resource Center, The Rockefeller University, New York, NY 10065
| | - Thomas Carroll
- Bioinformatics Resource Center, The Rockefeller University, New York, NY 10065
| |
Collapse
|
4
|
Mehta RS, Petit RA, Read TD, Weissman DB. Detecting patterns of accessory genome coevolution in Staphylococcus aureus using data from thousands of genomes. BMC Bioinformatics 2023; 24:243. [PMID: 37296404 PMCID: PMC10251594 DOI: 10.1186/s12859-023-05363-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 05/26/2023] [Indexed: 06/12/2023] Open
Abstract
Bacterial genomes exhibit widespread horizontal gene transfer, resulting in highly variable genome content that complicates the inference of genetic interactions. In this study, we develop a method for detecting coevolving genes from large datasets of bacterial genomes based on pairwise comparisons of closely related individuals, analogous to a pedigree study in eukaryotic populations. We apply our method to pairs of genes from the Staphylococcus aureus accessory genome of over 75,000 annotated gene families using a database of over 40,000 whole genomes. We find many pairs of genes that appear to be gained or lost in a coordinated manner, as well as pairs where the gain of one gene is associated with the loss of the other. These pairs form networks of rapidly coevolving genes, primarily consisting of genes involved in virulence, mechanisms of horizontal gene transfer, and antibiotic resistance, particularly the SCCmec complex. While we focus on gene gain and loss, our method can also detect genes that tend to acquire substitutions in tandem, or genotype-phenotype or phenotype-phenotype coevolution. Finally, we present the R package DeCoTUR that allows for the computation of our method.
Collapse
Affiliation(s)
- Rohan S Mehta
- Department of Physics, Emory University, Atlanta, GA, USA.
| | - Robert A Petit
- Division of Infectious Diseases, Department of Medicine, School of Medicine, Emory University, Atlanta, GA, USA
- Wyoming Public Health Laboratory, Cheyenne, WY, USA
| | - Timothy D Read
- Division of Infectious Diseases, Department of Medicine, School of Medicine, Emory University, Atlanta, GA, USA
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, GA, USA
| | | |
Collapse
|
5
|
Liu C, Kenney T, Beiko RG, Gu H. The Community Coevolution Model with Application to the Study of Evolutionary Relationships between Genes based on Phylogenetic Profiles. Syst Biol 2022:6651862. [PMID: 35904761 DOI: 10.1093/sysbio/syac052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 07/15/2022] [Accepted: 07/19/2022] [Indexed: 11/13/2022] Open
Abstract
Organismal traits can evolve in a coordinated way, with correlated patterns of gains and losses reflecting important evolutionary associations. Discovering these associations can reveal important information about the functional and ecological linkages among traits. Phylogenetic profiles treat individual genes as traits distributed across sets of genomes and can provide a fine-grained view of the genetic underpinnings of evolutionary processes in a set of genomes. Phylogenetic profiling has been used to identify genes that are functionally linked, and to identify common patterns of lateral gene transfer in microorganisms. However, comparative analysis of phylogenetic profiles and other trait distributions should take into account the phylogenetic relationships among the organisms under consideration. Here we propose the Community Coevolution Model (CCM), a new coevolutionary model to analyze the evolutionary associations among traits, with a focus on phylogenetic profiles. In the CCM, traits are considered to evolve as a community with interactions, and the transition rate for each trait depends on the current states of other traits. Surpassing other comparative methods for pairwise trait analysis, CCM has the additional advantage of being able to examine multiple traits as a community to reveal more dependency relationships. We also develop a simulation procedure to generate phylogenetic profiles with correlated evolutionary patterns that can be used as benchmark data for evaluation purposes. A simulation study demonstrates that CCM is more accurate than other methods including the Jaccard Index and three tree-aware methods. The parameterization of CCM makes the interpretation of the relations between genes more direct, which leads to Darwin's scenario being identified easily based on the estimated parameters. We show that CCM is more efficient and fits real data better than other methods resulting in higher likelihood scores with fewer parameters. An examination of 3786 phylogenetic profiles across a set of 659 bacterial genomes highlights linkages between genes with common functions, including many patterns that would not have been identified under a non-phylogenetic model of common distribution. We also applied the CCM to 44 proteins in the well-studied Mitochondrial Respiratory Complex I and recovered associations that mapped well onto the structural associations that exist in the complex.
Collapse
Affiliation(s)
- Chaoyue Liu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, B3H 4R2, Canada.,Faculty of Computer Science, Dalhousie University, Halifax, B3H 4R2, Canada
| | - Toby Kenney
- Department of Mathematics and Statistics, Dalhousie University, Halifax, B3H 4R2, Canada
| | - Robert G Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, B3H 4R2, Canada
| | - Hong Gu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, B3H 4R2, Canada
| |
Collapse
|
6
|
White H, Vos M, Sheppard SK, Pascoe B, Raymond B. Signatures of selection in core and accessory genomes indicate different ecological drivers of diversification among Bacillus cereus clades. Mol Ecol 2022; 31:3584-3597. [PMID: 35510788 PMCID: PMC9324797 DOI: 10.1111/mec.16490] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 03/31/2022] [Accepted: 04/12/2022] [Indexed: 11/30/2022]
Abstract
Bacterial clades are often ecologically distinct, despite extensive horizontal gene transfer (HGT). How selection works on different parts of bacterial pan-genomes to drive and maintain the emergence of clades is unclear. Focusing on the three largest clades in the diverse and well-studied Bacillus cereus sensu lato group, we identified clade-specific core genes (present in all clade members) and then used clade-specific allelic diversity to identify genes under purifying and diversifying selection. Clade-specific accessory genes (present in a subset of strains within a clade) were characterized as being under selection using presence/absence in specific clades. Gene ontology analyses of genes under selection revealed that different gene functions were enriched in different clades. Furthermore, some gene functions were enriched only amongst clade-specific core or accessory genomes. Genes under purifying selection were often clade-specific, while genes under diversifying selection showed signs of frequent HGT. These patterns are consistent with different selection pressures acting on both the core and the accessory genomes of different clades and can lead to ecological divergence in both cases. Examining variation in allelic diversity allows us to uncover genes under clade-specific selection, allowing ready identification of strains and their ecological niche.
Collapse
Affiliation(s)
- Hugh White
- Centre for Ecology and ConservationUniversity of ExeterPenrynUK
| | - Michiel Vos
- European Centre for Environment and Human HealthUniversity of Exeter Medical SchoolEnvironment and Sustainability InstitutePenryn CampusUK
| | - Samuel K. Sheppard
- Milner Centre for EvolutionDepartment of Biology & BiotechnologyUniversity of BathBathUK
| | - Ben Pascoe
- Milner Centre for EvolutionDepartment of Biology & BiotechnologyUniversity of BathBathUK
| | - Ben Raymond
- Centre for Ecology and ConservationUniversity of ExeterPenrynUK
| |
Collapse
|
7
|
Khandai K, Navarro-Martinez C, Smith B, Buonopane R, Byun SA, Patterson M. Determining Significant Correlation Between Pairs of Extant Characters in a Small Parsimony Framework. J Comput Biol 2022; 29:1132-1154. [PMID: 35723627 DOI: 10.1089/cmb.2022.0141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
When studying the evolutionary relationships among a set of species, the principle of parsimony states that a relationship involving the fewest number of evolutionary events is likely the correct one. Due to its simplicity, this principle was formalized in the context of computational evolutionary biology decades ago by, for example, Fitch and Sankoff. Because the parsimony framework does not require a model of evolution, unlike maximum likelihood or Bayesian approaches, it is often a good starting point when no reasonable estimate of such a model is available. In this work, we devise a method for determining if pairs of discrete characters are significantly correlated across all most parsimonious reconstructions, given a set of species on these characters, and an evolutionary tree. The first step of this method is to use Sankoff's algorithm to compute all most parsimonious assignments of ancestral states (of each character) to the internal nodes of the phylogeny. Correlation between a pair of evolutionary events (e.g., absent to present) for a pair of characters is then determined by the (co-) occurrence patterns between the sets of their respective ancestral assignments. The probability of obtaining a correlation this extreme (or more) under a null hypothesis where the events happen randomly on the evolutionary tree is then used to assess the significance of this correlation. We implement this method: parcours (PARsimonious CO-occURrenceS) and use it to identify significantly correlated evolution among vocalizations and morphological characters in the Felidae family.
Collapse
Affiliation(s)
- Kaustubh Khandai
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| | | | - Brendan Smith
- Department of Biology, Fairfield University, Fairfield, Connecticut, USA
| | - Rebecca Buonopane
- Department of Biology, Fairfield University, Fairfield, Connecticut, USA
| | - Soyong Ashley Byun
- Department of Biology, Fairfield University, Fairfield, Connecticut, USA
| | - Murray Patterson
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
8
|
Fukunaga T, Iwasaki W. Inverse Potts model improves accuracy of phylogenetic profiling. Bioinformatics 2022; 38:1794-1800. [PMID: 35060594 PMCID: PMC8963296 DOI: 10.1093/bioinformatics/btac034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 01/11/2022] [Accepted: 01/13/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity metrics in phylogenetic profiling achieved high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. While previous studies reduced the evolutionary bias by considering a phylogenetic tree, few studies have analyzed the spurious correlation bias. RESULTS To reduce the spurious correlation bias, we developed metrics based on the inverse Potts model (IPM) for phylogenetic profiling. We also developed a metric based on both the IPM and a phylogenetic tree. In an empirical dataset analysis, we demonstrated that these IPM-based metrics improved the prediction performance of phylogenetic profiling. In addition, we found that the integration of several metrics, including the IPM-based metrics, had superior performance to a single metric. AVAILABILITY AND IMPLEMENTATION The source code is freely available at https://github.com/fukunagatsu/Ipm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 2770882, Japan,Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo 1130032, Japan,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 2770882, Japan,Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba 2770882, Japan,Institute for Quantitative Biosciences, The University of Tokyo, Tokyo 1130032, Japan,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo 1130032, Japan
| |
Collapse
|
9
|
On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins. PLoS Comput Biol 2021; 17:e1008957. [PMID: 34029316 PMCID: PMC8177639 DOI: 10.1371/journal.pcbi.1008957] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 06/04/2021] [Accepted: 04/09/2021] [Indexed: 12/04/2022] Open
Abstract
Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings. Many homologous protein families contain thousands of highly diverged amino-acid sequences, which fold into close-to-identical three-dimensional structures and fulfill almost identical biological tasks. Global coevolutionary models, like those inferred by the Direct Coupling Analysis (DCA), assume that families can be considered as samples of some unknown statistical model, and that the parameters of these models represent evolutionary constraints acting on protein sequences. To learn these models from data, DCA and related approaches have to also assume that the distinct sequences in a protein family are close to independent, while in reality they are characterized by involved hierarchical phylogenetic relationships. Here we propose Null models for sequence alignments, which maintain patterns of amino-acid conservation and phylogeny contained in the data, but destroy any coevolutionary couplings, frequently used in protein structure prediction. We find that phylogeny actually induces spurious non-zero couplings. These are, however, significantly smaller that the largest couplings derived from natural sequences, and therefore have only little influence on the first predicted contacts. However, in the range of intermediate couplings, they may lead to statistically significant effects. Dissecting phylogenetic from functional couplings might therefore extend the range of accurately predicted structural contacts down to smaller coupling strengths than those currently used.
Collapse
|
10
|
Hall RJ, Whelan FJ, McInerney JO, Ou Y, Domingo-Sananes MR. Horizontal Gene Transfer as a Source of Conflict and Cooperation in Prokaryotes. Front Microbiol 2020; 11:1569. [PMID: 32849327 PMCID: PMC7396663 DOI: 10.3389/fmicb.2020.01569] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 06/17/2020] [Indexed: 02/01/2023] Open
Abstract
Horizontal gene transfer (HGT) is one of the most important processes in prokaryote evolution. The sharing of DNA can spread neutral or beneficial genes, as well as genetic parasites across populations and communities, creating a large proportion of the variability acted on by natural selection. Here, we highlight the role of HGT in enhancing the opportunities for conflict and cooperation within and between prokaryote genomes. We discuss how horizontally acquired genes can cooperate or conflict both with each other and with a recipient genome, resulting in signature patterns of gene co-occurrence, avoidance, and dependence. We then describe how interactions involving horizontally transferred genes may influence cooperation and conflict at higher levels (populations, communities, and symbioses). Finally, we consider the benefits and drawbacks of HGT for prokaryotes and its fundamental role in understanding conflict and cooperation from the gene-gene to the microbiome level.
Collapse
Affiliation(s)
- Rebecca J Hall
- School of Life Sciences, University of Nottingham, Nottingham, United Kingdom
| | - Fiona J Whelan
- School of Life Sciences, University of Nottingham, Nottingham, United Kingdom
| | - James O McInerney
- School of Life Sciences, University of Nottingham, Nottingham, United Kingdom.,Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, United Kingdom
| | - Yaqing Ou
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, United Kingdom
| | | |
Collapse
|
11
|
Whelan FJ, Rusilowicz M, McInerney JO. Coinfinder: detecting significant associations and dissociations in pangenomes. Microb Genom 2020; 6:e000338. [PMID: 32100706 PMCID: PMC7200068 DOI: 10.1099/mgen.0.000338] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 01/23/2020] [Indexed: 12/16/2022] Open
Abstract
The accessory genes of prokaryote and eukaryote pangenomes accumulate by horizontal gene transfer, differential gene loss, and the effects of selection and drift. We have developed Coinfinder, a software program that assesses whether sets of homologous genes (gene families) in pangenomes associate or dissociate with each other (i.e. are 'coincident') more often than would be expected by chance. Coinfinder employs a user-supplied phylogenetic tree in order to assess the lineage-dependence (i.e. the phylogenetic distribution) of each accessory gene, allowing Coinfinder to focus on coincident gene pairs whose joint presence is not simply because they happened to appear in the same clade, but rather that they tend to appear together more often than expected across the phylogeny. Coinfinder is implemented in C++, Python3 and R and is freely available under the GNU license from https://github.com/fwhelan/coinfinder.
Collapse
Affiliation(s)
- Fiona Jane Whelan
- School of Life Sciences, The University of Nottingham, Nottingham, UK
| | - Martin Rusilowicz
- Faculty of Biology, Medicine & Health, The University of Manchester, Manchester, UK
| | - James Oscar McInerney
- School of Life Sciences, The University of Nottingham, Nottingham, UK
- Faculty of Biology, Medicine & Health, The University of Manchester, Manchester, UK
| |
Collapse
|
12
|
Croce G, Gueudré T, Ruiz Cuevas MV, Keidel V, Figliuzzi M, Szurmant H, Weigt M. A multi-scale coevolutionary approach to predict interactions between protein domains. PLoS Comput Biol 2019; 15:e1006891. [PMID: 31634362 PMCID: PMC6822775 DOI: 10.1371/journal.pcbi.1006891] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 10/31/2019] [Accepted: 09/27/2019] [Indexed: 11/18/2022] Open
Abstract
Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30-50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions.
Collapse
Affiliation(s)
- Giancarlo Croce
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | | | - Maria Virginia Ruiz Cuevas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Victoria Keidel
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| |
Collapse
|
13
|
Oteri F, Nadalin F, Champeimont R, Carbone A. BIS2Analyzer: a server for co-evolution analysis of conserved protein families. Nucleic Acids Res 2019; 45:W307-W314. [PMID: 28472458 PMCID: PMC5570204 DOI: 10.1093/nar/gkx336] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2017] [Accepted: 04/18/2017] [Indexed: 12/13/2022] Open
Abstract
Along protein sequences, co-evolution analysis identifies residue pairs demonstrating either a specific co-adaptation, where changes in one of the residues are compensated by changes in the other during evolution or a less specific external force that affects the evolutionary rates of both residues in a similar magnitude. In both cases, independently of the underlying cause, co-evolutionary signatures within or between proteins serve as markers of physical interactions and/or functional relationships. Depending on the type of protein under study, the set of available homologous sequences may greatly differ in size and amino acid variability. BIS2Analyzer, openly accessible at http://www.lcqb.upmc.fr/BIS2Analyzer/, is a web server providing the online analysis of co-evolving amino-acid pairs in protein alignments, especially designed for vertebrate and viral protein families, which typically display a small number of highly similar sequences. It is based on BIS2, a re-implemented fast version of the co-evolution analysis tool Blocks in Sequences (BIS). BIS2Analyzer provides a rich and interactive graphical interface to ease biological interpretation of the results.
Collapse
Affiliation(s)
- Francesco Oteri
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Francesca Nadalin
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Raphaël Champeimont
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Alessandra Carbone
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France.,Institut Universitaire de France, 75005 Paris, France
| |
Collapse
|
14
|
Liu C, Wright B, Allen-Vercoe E, Gu H, Beiko R. Phylogenetic Clustering of Genes Reveals Shared Evolutionary Trajectories and Putative Gene Functions. Genome Biol Evol 2018; 10:2255-2265. [PMID: 30137329 PMCID: PMC6130602 DOI: 10.1093/gbe/evy178] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/17/2018] [Indexed: 11/20/2022] Open
Abstract
Homologous genes in prokaryotes can be described using phylogenetic profiles which summarize their patterns of presence or absence across a set of genomes. Phylogenetic profiles have been used for nearly twenty years to cluster genes based on measures such as the Euclidean distance between profile vectors. However, most approaches do not take into account the phylogenetic relationships amongst the profiled genomes, and overrepresentation of certain taxonomic groups (i.e., pathogenic species with many sequenced representatives) can skew the interpretation of profiles. We propose a new approach that uses a coevolutionary method defined by Pagel to account for the phylogenetic relationships amongst target organisms, and a hierarchical-clustering approach to define sets of genes with common distributions across the organisms. The clusters we obtain using our method show greater evidence of phylogenetic and functional clustering than a recently published approach based on hidden Markov models. Our clustering method identifies sets of amino-acid biosynthesis genes that constitute cohesive pathways, and motility/chemotaxis genes with common histories of descent and lateral gene transfer.
Collapse
Affiliation(s)
- Chaoyue Liu
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada.,Department of Mathematics and Statistics, Faculty of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Benjamin Wright
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Emma Allen-Vercoe
- Department of Molecular and Cellular Biology, University of Guelph, Ontario, Canada
| | - Hong Gu
- Department of Mathematics and Statistics, Faculty of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Robert Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
15
|
Katz LA. Recent events dominate interdomain lateral gene transfers between prokaryotes and eukaryotes and, with the exception of endosymbiotic gene transfers, few ancient transfer events persist. Philos Trans R Soc Lond B Biol Sci 2016; 370:20140324. [PMID: 26323756 DOI: 10.1098/rstb.2014.0324] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
While there is compelling evidence for the impact of endosymbiotic gene transfer (EGT; transfer from either mitochondrion or chloroplast to the nucleus) on genome evolution in eukaryotes, the role of interdomain transfer from bacteria and/or archaea (i.e. prokaryotes) is less clear. Lateral gene transfers (LGTs) have been argued to be potential sources of phylogenetic information, particularly for reconstructing deep nodes that are difficult to recover with traditional phylogenetic methods. We sought to identify interdomain LGTs by using a phylogenomic pipeline that generated 13 465 single gene trees and included up to 487 eukaryotes, 303 bacteria and 118 archaea. Our goals include searching for LGTs that unite major eukaryotic clades, and describing the relative contributions of LGT and EGT across the eukaryotic tree of life. Given the difficulties in interpreting single gene trees that aim to capture the approximately 1.8 billion years of eukaryotic evolution, we focus on presence-absence data to identify interdomain transfer events. Specifically, we identify 1138 genes found only in prokaryotes and representatives of three or fewer major clades of eukaryotes (e.g. Amoebozoa, Archaeplastida, Excavata, Opisthokonta, SAR and orphan lineages). The majority of these genes have phylogenetic patterns that are consistent with recent interdomain LGTs and, with the notable exception of EGTs involving photosynthetic eukaryotes, we detect few ancient interdomain LGTs. These analyses suggest that LGTs have probably occurred throughout the history of eukaryotes, but that ancient events are not maintained unless they are associated with endosymbiotic gene transfer among photosynthetic lineages.
Collapse
Affiliation(s)
- Laura A Katz
- Department of Biological Sciences, Smith College, Northampton, MA 01063, USA Program in Organismic and Evolutionary Biology, UMass-Amherst, Amherst, MA 01003, USA
| |
Collapse
|
16
|
Burstein D, Amaro F, Zusman T, Lifshitz Z, Cohen O, Gilbert JA, Pupko T, Shuman HA, Segal G. Genomic analysis of 38 Legionella species identifies large and diverse effector repertoires. Nat Genet 2016; 48:167-75. [PMID: 26752266 DOI: 10.1038/ng.3481] [Citation(s) in RCA: 203] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 12/08/2015] [Indexed: 11/09/2022]
Abstract
Infection by the human pathogen Legionella pneumophila relies on the translocation of ∼ 300 virulence proteins, termed effectors, which manipulate host cell processes. However, almost no information exists regarding effectors in other Legionella pathogens. Here we sequenced, assembled and characterized the genomes of 38 Legionella species and predicted their effector repertoires using a previously validated machine learning approach. This analysis identified 5,885 predicted effectors. The effector repertoires of different Legionella species were found to be largely non-overlapping, and only seven core effectors were shared by all species studied. Species-specific effectors had atypically low GC content, suggesting exogenous acquisition, possibly from the natural protozoan hosts of these species. Furthermore, we detected numerous new conserved effector domains and discovered new domain combinations, which allowed the inference of as yet undescribed effector functions. The effector collection and network of domain architectures described here can serve as a roadmap for future studies of effector function and evolution.
Collapse
Affiliation(s)
- David Burstein
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Francisco Amaro
- Department of Microbiology, University of Chicago, Chicago, Illinois, USA
| | - Tal Zusman
- Department of Molecular Microbiology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ziv Lifshitz
- Department of Molecular Microbiology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Ofir Cohen
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Jack A Gilbert
- Biology Division, Argonne National Laboratory and Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Howard A Shuman
- Department of Microbiology, University of Chicago, Chicago, Illinois, USA
| | - Gil Segal
- Department of Molecular Microbiology and Biotechnology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
17
|
Caufield JH, Abreu M, Wimble C, Uetz P. Protein complexes in bacteria. PLoS Comput Biol 2015; 11:e1004107. [PMID: 25723151 PMCID: PMC4344305 DOI: 10.1371/journal.pcbi.1004107] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 01/02/2015] [Indexed: 01/26/2023] Open
Abstract
Large-scale analyses of protein complexes have recently become available for Escherichia coli and Mycoplasma pneumoniae, yielding 443 and 116 heteromultimeric soluble protein complexes, respectively. We have coupled the results of these mass spectrometry-characterized protein complexes with the 285 “gold standard” protein complexes identified by EcoCyc. A comparison with databases of gene orthology, conservation, and essentiality identified proteins conserved or lost in complexes of other species. For instance, of 285 “gold standard” protein complexes in E. coli, less than 10% are fully conserved among a set of 7 distantly-related bacterial “model” species. Complex conservation follows one of three models: well-conserved complexes, complexes with a conserved core, and complexes with partial conservation but no conserved core. Expanding the comparison to 894 distinct bacterial genomes illustrates fractional conservation and the limits of co-conservation among components of protein complexes: just 14 out of 285 model protein complexes are perfectly conserved across 95% of the genomes used, yet we predict more than 180 may be partially conserved across at least half of the genomes. No clear relationship between gene essentiality and protein complex conservation is observed, as even poorly conserved complexes contain a significant number of essential proteins. Finally, we identify 183 complexes containing well-conserved components and uncharacterized proteins which will be interesting targets for future experimental studies. Though more than 20,000 binary protein-protein interactions have been published for a few well-studied bacterial species, the results rarely capture the full extent to which proteins take part in complexes. Here, we use experimentally-observed protein complexes from E. coli or Mycoplasma pneumoniae, as well as gene orthology, to predict protein complexes across many species of bacteria. Surprisingly, the majority of protein complexes is not conserved, demonstrating an unexpected evolutionary flexibility. We also observe broader trends within protein complex conservation, especially in genome-reduced species with minimal sets of protein complexes.
Collapse
Affiliation(s)
- J. Harry Caufield
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Marco Abreu
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Christopher Wimble
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Peter Uetz
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
- * E-mail:
| |
Collapse
|
18
|
Dib L, Silvestro D, Salamin N. Evolutionary footprint of coevolving positions in genes. Bioinformatics 2014; 30:1241-9. [DOI: 10.1093/bioinformatics/btu012] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
|