1
|
Gulay A, Fournier G, Smets BF, Girguis PR. Proterozoic Acquisition of Archaeal Genes for Extracellular Electron Transfer: A Metabolic Adaptation of Aerobic Ammonia-Oxidizing Bacteria to Oxygen Limitation. Mol Biol Evol 2023; 40:msad161. [PMID: 37440531 PMCID: PMC10415592 DOI: 10.1093/molbev/msad161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 06/09/2023] [Accepted: 07/06/2023] [Indexed: 07/15/2023] Open
Abstract
Many aerobic microbes can utilize alternative electron acceptors under oxygen-limited conditions. In some cases, this is mediated by extracellular electron transfer (or EET), wherein electrons are transferred to extracellular oxidants such as iron oxide and manganese oxide minerals. Here, we show that an ammonia-oxidizer previously known to be strictly aerobic, Nitrosomonas communis, may have been able to utilize a poised electrode to maintain metabolic activity in anoxic conditions. The presence and activity of multiheme cytochromes in N. communis further suggest a capacity for EET. Molecular clock analysis shows that the ancestors of β-proteobacterial ammonia oxidizers appeared after Earth's atmospheric oxygenation when the oxygen levels were >10-4pO2 (present atmospheric level [PAL]), consistent with aerobic origins. Equally important, phylogenetic reconciliations of gene and species trees show that the multiheme c-type EET proteins in Nitrosomonas and Nitrosospira lineages were likely acquired by gene transfer from γ-proteobacteria when the oxygen levels were between 0.1 and 1 pO2 (PAL). These results suggest that β-proteobacterial EET evolved during the Proterozoic when oxygen limitation was widespread, but oxidized minerals were abundant.
Collapse
Affiliation(s)
- Arda Gulay
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Environmental and Resource Engineering, Technical University of Denmark, Lyngby, Denmark
| | - Greg Fournier
- Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Barth F Smets
- Department of Environmental and Resource Engineering, Technical University of Denmark, Lyngby, Denmark
| | - Peter R Girguis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
2
|
Dennler O, Coste F, Blanquart S, Belleannée C, Théret N. Phylogenetic inference of the emergence of sequence modules and protein-protein interactions in the ADAMTS-TSL family. PLoS Comput Biol 2023; 19:e1011404. [PMID: 37651409 PMCID: PMC10499240 DOI: 10.1371/journal.pcbi.1011404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 09/13/2023] [Accepted: 08/01/2023] [Indexed: 09/02/2023] Open
Abstract
Numerous computational methods based on sequences or structures have been developed for the characterization of protein function, but they are still unsatisfactory to deal with the multiple functions of multi-domain protein families. Here we propose an original approach based on 1) the detection of conserved sequence modules using partial local multiple alignment, 2) the phylogenetic inference of species/genes/modules/functions evolutionary histories, and 3) the identification of co-appearances of modules and functions. Applying our framework to the multidomain ADAMTS-TSL family including ADAMTS (A Disintegrin-like and Metalloproteinase with ThromboSpondin motif) and ADAMTS-like proteins over nine species including human, we identify 45 sequence module signatures that are associated with the occurrence of 278 Protein-Protein Interactions in ancestral genes. Some of these signatures are supported by published experimental data and the others provide new insights (e.g. ADAMTS-5). The module signatures of ADAMTS ancestors notably highlight the dual variability of the propeptide and ancillary regions suggesting the importance of these two regions in the specialization of ADAMTS during evolution. Our analyses further indicate convergent interactions of ADAMTS with COMP and CCN2 proteins. Overall, our study provides 186 sequence module signatures that discriminate distinct subgroups of ADAMTS and ADAMTSL and that may result from selective pressures on novel functions and phenotypes.
Collapse
Affiliation(s)
- Olivier Dennler
- Univ Rennes, Inria, CNRS, IRISA, UMR 6074, Rennes, France
- Univ Rennes, Inserm, EHESP, Irset, UMR S1085, Rennes, France
| | - François Coste
- Univ Rennes, Inria, CNRS, IRISA, UMR 6074, Rennes, France
| | | | | | - Nathalie Théret
- Univ Rennes, Inria, CNRS, IRISA, UMR 6074, Rennes, France
- Univ Rennes, Inserm, EHESP, Irset, UMR S1085, Rennes, France
| |
Collapse
|
3
|
Bryce S, Stolzer M, Crosby D, Yang R, Durand D, Lee TH. Human atlastin-3 is a constitutive ER membrane fusion catalyst. J Cell Biol 2023; 222:e202211021. [PMID: 37102997 PMCID: PMC10140384 DOI: 10.1083/jcb.202211021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 02/28/2023] [Accepted: 04/04/2023] [Indexed: 04/28/2023] Open
Abstract
Homotypic membrane fusion catalyzed by the atlastin (ATL) GTPase sustains the branched endoplasmic reticulum (ER) network in metazoans. Our recent discovery that two of the three human ATL paralogs (ATL1/2) are C-terminally autoinhibited implied that relief of autoinhibition would be integral to the ATL fusion mechanism. An alternative hypothesis is that the third paralog ATL3 promotes constitutive ER fusion with relief of ATL1/2 autoinhibition used conditionally. However, published studies suggest ATL3 is a weak fusogen at best. Contrary to expectations, we demonstrate here that purified human ATL3 catalyzes efficient membrane fusion in vitro and is sufficient to sustain the ER network in triple knockout cells. Strikingly, ATL3 lacks any detectable C-terminal autoinhibition, like the invertebrate Drosophila ATL ortholog. Phylogenetic analysis of ATL C-termini indicates that C-terminal autoinhibition is a recent evolutionary innovation. We suggest that ATL3 is a constitutive ER fusion catalyst and that ATL1/2 autoinhibition likely evolved in vertebrates as a means of upregulating ER fusion activity on demand.
Collapse
Affiliation(s)
- Samantha Bryce
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Maureen Stolzer
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Daniel Crosby
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ruijin Yang
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tina H. Lee
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
4
|
Fong SL, Capra JA. Function and Constraint in Enhancer Sequences with Multiple Evolutionary Origins. Genome Biol Evol 2022; 14:evac159. [PMID: 36314566 PMCID: PMC9673499 DOI: 10.1093/gbe/evac159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/22/2022] [Indexed: 11/04/2022] Open
Abstract
Thousands of human gene regulatory enhancers are composed of sequences with multiple evolutionary origins. These evolutionarily "complex" enhancers consist of older "core" sequences and younger "derived" sequences. However, the functional relationship between the sequences of different evolutionary origins within complex enhancers is poorly understood. We evaluated the function, selective pressures, and sequence variation across core and derived components of human complex enhancers. We find that both components are older than expected from the genomic background, and complex enhancers are enriched for core and derived sequences of similar evolutionary ages. Both components show strong evidence of biochemical activity in massively parallel report assays. However, core and derived sequences have distinct transcription factor (TF)-binding preferences that are largely similar across evolutionary origins. As expected, given these signatures of function, both core and derived sequences have substantial evidence of purifying selection. Nonetheless, derived sequences exhibit weaker purifying selection than adjacent cores. Derived sequences also tolerate more common genetic variation and are enriched compared with cores for expression quantitative trait loci associated with gene expression variability in human populations. In conclusion, both core and derived sequences have strong evidence of gene regulatory function, but derived sequences have distinct constraint profiles, TF-binding preferences, and tolerance to variation compared with cores. We propose that the step-wise integration of younger derived with older core sequences has generated regulatory substrates with robust activity and the potential for functional variation. Our analyses demonstrate that synthesizing study of enhancer evolution and function can aid interpretation of regulatory sequence activity and functional variation across human populations.
Collapse
Affiliation(s)
- Sarah L Fong
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, Tennessee
| | - John A Capra
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco
| |
Collapse
|
5
|
Menet H, Daubin V, Tannier E. Phylogenetic reconciliation. PLoS Comput Biol 2022; 18:e1010621. [PMID: 36327227 PMCID: PMC9632901 DOI: 10.1371/journal.pcbi.1010621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Affiliation(s)
- Hugo Menet
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- * E-mail: (VD); (ET)
| | - Eric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558,Villeurbanne, France
- Inria, centre de recherche de Lyon, Villeurbanne, France
- * E-mail: (VD); (ET)
| |
Collapse
|
6
|
Penel S, Menet H, Tricou T, Daubin V, Tannier E. Thirdkind: displaying phylogenetic encounters beyond 2-level reconciliation. Bioinformatics 2022; 38:2350-2352. [PMID: 35139153 DOI: 10.1093/bioinformatics/btac062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/26/2022] [Accepted: 02/03/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Reconciliation between a host and its symbiont phylogenies or between a species and a gene phylogenies is a prevalent approach in evolution, however no simple generic tool (i.e. virtually usable by all reconciliation software, from host/symbiont to species/gene comparisons) is available to visualize reconciliation results. Moreover there is no tool to visualize 3-levels reconciliations, i.e. to visualize 2 nested reconciliations as for example in a host/symbiont/gene complex. RESULTS Thirdkind is a light and easy to install command line software producing svg files displaying reconciliations, including 3-levels reconciliations. It takes a standard format recPhyloXML as input, and is thus usable with most reconciliation software. AVAILABILITY AND IMPLEMENTATION https://github.com/simonpenel/thirdkind/wiki. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Simon Penel
- Laboratoire de Biométrie et Biologie Evolutive/UMR5558, CNRS/UCBL, Villeurbanne 69622, France
| | - Hugo Menet
- Laboratoire de Biométrie et Biologie Evolutive/UMR5558, CNRS/UCBL, Villeurbanne 69622, France
| | - Théo Tricou
- Laboratoire de Biométrie et Biologie Evolutive/UMR5558, CNRS/UCBL, Villeurbanne 69622, France
| | - Vincent Daubin
- Laboratoire de Biométrie et Biologie Evolutive/UMR5558, CNRS/UCBL, Villeurbanne 69622, France
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Evolutive/UMR5558, CNRS/UCBL, Villeurbanne 69622, France.,Centre de Recherche Inria Lyon, Villeurbanne 69622, France
| |
Collapse
|
7
|
Bansal MS. Deciphering Microbial Gene Family Evolution Using Duplication-Transfer-Loss Reconciliation and RANGER-DTL. Methods Mol Biol 2022; 2569:233-252. [PMID: 36083451 DOI: 10.1007/978-1-0716-2691-7_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenetic reconciliation has emerged as a principled, highly effective technique for investigating the origin, spread, and evolutionary history of microbial gene families. Proper application of phylogenetic reconciliation requires a clear understanding of potential pitfalls and sources of error, and knowledge of the most effective reconciliation-based tools and protocols to use to maximize accuracy. In this book chapter, we provide a brief overview of Duplication-Transfer-Loss (DTL) reconciliation, the standard reconciliation model used to study microbial gene families and provide a step-by-step computational protocol to maximize the accuracy of DTL reconciliation and minimize false-positive evolutionary inferences.
Collapse
Affiliation(s)
- Mukul S Bansal
- Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, USA.
| |
Collapse
|
8
|
Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou L, Mi H. PANTHER: Making genome-scale phylogenetics accessible to all. Protein Sci 2022; 31:8-22. [PMID: 34717010 PMCID: PMC8740835 DOI: 10.1002/pro.4218] [Citation(s) in RCA: 533] [Impact Index Per Article: 266.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 10/24/2021] [Accepted: 10/26/2021] [Indexed: 02/03/2023]
Abstract
Phylogenetics is a powerful tool for analyzing protein sequences, by inferring their evolutionary relationships to other proteins. However, phylogenetics analyses can be challenging: they are computationally expensive and must be performed carefully in order to avoid systematic errors and artifacts. Protein Analysis THrough Evolutionary Relationships (PANTHER; http://pantherdb.org) is a publicly available, user-focused knowledgebase that stores the results of an extensive phylogenetic reconstruction pipeline that includes computational and manual processes and quality control steps. First, fully reconciled phylogenetic trees (including ancestral protein sequences) are reconstructed for a set of "reference" protein sequences obtained from fully sequenced genomes of organisms across the tree of life. Second, the resulting phylogenetic trees are manually reviewed and annotated with function evolution events: inferred gains and losses of protein function along branches of the phylogenetic tree. Here, we describe in detail the current contents of PANTHER, how those contents are generated, and how they can be used in a variety of applications. The PANTHER knowledgebase can be downloaded or accessed via an extensive API. In addition, PANTHER provides software tools to facilitate the application of the knowledgebase to common protein sequence analysis tasks: exploring an annotated genome by gene function; performing "enrichment analysis" of lists of genes; annotating a single sequence or large batch of sequences by homology; and assessing the likelihood that a genetic variant at a particular site in a protein will have deleterious effects.
Collapse
Affiliation(s)
- Paul D. Thomas
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Dustin Ebert
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Tremayne Mushayahama
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Laurent‐Philippe Albou
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| |
Collapse
|
9
|
Aluru C, Singh M. Improved inference of tandem domain duplications. Bioinformatics 2021; 37:i133-i141. [PMID: 34252920 PMCID: PMC8275333 DOI: 10.1093/bioinformatics/btab329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/03/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein domain duplications are a major contributor to the functional diversification of protein families. These duplications can occur one at a time through single domain duplications, or as tandem duplications where several consecutive domains are duplicated together as part of a single evolutionary event. Existing methods for inferring domain-level evolutionary events are based on reconciling domain trees with gene trees. While some formulations consider multiple domain duplications, they do not explicitly model tandem duplications; this leads to inaccurate inference of which domains duplicated together over the course of evolution. RESULTS Here, we introduce a reconciliation-based framework that considers the relative positions of domains within extant sequences. We use this information to uncover tandem domain duplications within the evolutionary history of these genes. We devise an integer linear programming approach that solves our problem exactly, and a heuristic approach that works well in practice. We perform extensive simulation studies to demonstrate that our approaches can accurately uncover single and tandem domain duplications, and additionally test our approach on a well-studied orthogroup where lineage-specific domain expansions exhibit varying and complex domain duplication patterns. AVAILABILITY AND IMPLEMENTATION Code is available on github at https://github.com/Singh-Lab/TandemDuplications. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chaitanya Aluru
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Mona Singh
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| |
Collapse
|
10
|
Linard B, Ebersberger I, McGlynn SE, Glover N, Mochizuki T, Patricio M, Lecompte O, Nevers Y, Thomas PD, Gabaldón T, Sonnhammer E, Dessimoz C, Uchiyama I. Ten Years of Collaborative Progress in the Quest for Orthologs. Mol Biol Evol 2021; 38:3033-3045. [PMID: 33822172 PMCID: PMC8321534 DOI: 10.1093/molbev/msab098] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 02/07/2021] [Accepted: 04/01/2021] [Indexed: 12/19/2022] Open
Abstract
Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit-from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.
Collapse
Affiliation(s)
- Benjamin Linard
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,SPYGEN, Le Bourget-du-Lac, France
| | - Ingo Ebersberger
- Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, Germany.,LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt, Germany
| | - Shawn E McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan.,Blue Marble Space Institute of Science, Seattle, WA, USA
| | - Natasha Glover
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Tomohiro Mochizuki
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Yannis Nevers
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BCS-CNS), Jordi Girona, Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Department of Computer Science, University College London, London, United Kingdom.,Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ikuo Uchiyama
- Department of Theoretical Biology, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
| | | |
Collapse
|
11
|
Zhang Z, Wang W, Xia R, Pan G, Wang J, Tang J. Achieving large and distant ancestral genome inference by using an improved discrete quantum-behaved particle swarm optimization algorithm. BMC Bioinformatics 2020; 21:516. [PMID: 33176688 PMCID: PMC7656761 DOI: 10.1186/s12859-020-03833-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2020] [Accepted: 10/23/2020] [Indexed: 11/16/2022] Open
Abstract
Background Reconstructing ancestral genomes is one of the central problems presented in genome rearrangement analysis since finding the most likely true ancestor is of significant importance in phylogenetic reconstruction. Large scale genome rearrangements can provide essential insights into evolutionary processes. However, when the genomes are large and distant, classical median solvers have failed to adequately address these challenges due to the exponential increase of the search space. Consequently, solving ancestral genome inference problems constitutes a task of paramount importance that continues to challenge the current methods used in this area, whose difficulty is further increased by the ongoing rapid accumulation of whole-genome data. Results In response to these challenges, we provide two contributions for ancestral genome inference. First, an improved discrete quantum-behaved particle swarm optimization algorithm (IDQPSO) by averaging two of the fitness values is proposed to address the discrete search space. Second, we incorporate DCJ sorting into the IDQPSO (IDQPSO-Median). In comparison with the other methods, when the genomes are large and distant, IDQPSO-Median has the lowest median score, the highest adjacency accuracy, and the closest distance to the true ancestor. In addition, we have integrated our IDQPSO-Median approach with the GRAPPA framework. Our experiments show that this new phylogenetic method is very accurate and effective by using IDQPSO-Median. Conclusions Our experimental results demonstrate the advantages of IDQPSO-Median approach over the other methods when the genomes are large and distant. When our experimental results are evaluated in a comprehensive manner, it is clear that the IDQPSO-Median approach we propose achieves better scalability compared to existing algorithms. Moreover, our experimental results by using simulated and real datasets confirm that the IDQPSO-Median, when integrated with the GRAPPA framework, outperforms other heuristics in terms of accuracy, while also continuing to infer phylogenies that were equivalent or close to the true trees within 5 days of computation, which is far beyond the difficulty level that can be handled by GRAPPA.
Collapse
Affiliation(s)
- Zhaojuan Zhang
- College of Computer Science and Technology, Zhejiang University of Technology, Liuhe Road, Hangzhou, China
| | - Wanliang Wang
- College of Computer Science and Technology, Zhejiang University of Technology, Liuhe Road, Hangzhou, China.
| | - Ruofan Xia
- Department of Computer Science and Engineering, University of South Carolina, Assembly Street, Columbia, USA
| | - Gaofeng Pan
- Department of Computer Science and Engineering, University of South Carolina, Assembly Street, Columbia, USA
| | - Jiandong Wang
- Department of Computer Science and Engineering, University of South Carolina, Assembly Street, Columbia, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Assembly Street, Columbia, USA.,Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Yaguan Road, Tianjin, China
| |
Collapse
|
12
|
Xiao X, Xue GF, Stamatovic B, Qiu WR. Using Cellular Automata to Simulate Domain Evolution in Proteins. Front Genet 2020; 11:515. [PMID: 32582278 PMCID: PMC7296063 DOI: 10.3389/fgene.2020.00515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Accepted: 04/28/2020] [Indexed: 11/26/2022] Open
Abstract
Proteins play primary roles in important biological processes such as catalysis, physiological functions, and immune system functions. Thus, the research on how proteins evolved has been a nuclear question in the field of evolutionary biology. General models of protein evolution help to determine the baseline expectations for evolution of sequences, and these models have been extensively useful in sequence analysis as well as for the computer simulation of artificial sequence data sets. We have developed a new method of simulating multi-domain protein evolution, including fusions of domains, insertion, and deletion. It has been observed via the simulation test that the success rates achieved by the proposed predictor are remarkably high. For the convenience of the most experimental scientists, a user-friendly web server has been established at http://jci-bioinfo.cn/domainevo, by which users can easily get their desired results without having to go through the detailed mathematics. Through the simulation results of this website, users can predict the evolution trend of the protein domain architecture.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Guang-Fu Xue
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| | - Biljana Stamatovic
- Faculty of Information Systems and Technologies, University of Donja Gorica, Podgorica, Montenegro
| | - Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jingdezhen, China
| |
Collapse
|
13
|
Castillo JA, Secaira-Morocho H, Maldonado S, Sarmiento KN. Diversity and Evolutionary Dynamics of Antiphage Defense Systems in Ralstonia solanacearum Species Complex. Front Microbiol 2020; 11:961. [PMID: 32508782 PMCID: PMC7251935 DOI: 10.3389/fmicb.2020.00961] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 04/22/2020] [Indexed: 12/20/2022] Open
Abstract
Over the years, many researchers have reported a great diversity of bacteriophages infecting members of the Ralstonia solanacearum species complex (RSSC). This diversity has driven bacterial evolution by leading the emergence and maintenance of bacterial defense systems to combat phage infection. In this work, we present an in silico study of the arsenal of defense systems that RSSC harbors and their evolutionary history. For this purpose, we used a combination of genomic, phylogenetic and associative methods. We found that in addition to the CRISPR-Cas system already reported, there are eight other antiphage defense systems including the well-known Restriction-Modification and Toxin-Antitoxin systems. Furthermore, we found a tenth defense system, which is dedicated to reducing the incidence of plasmid transformation in bacteria. We undertook an analysis of the gene gain and loss patterns of the defense systems in 15 genomes of RSSC. Results indicate that the dynamics are inclined toward the gain of defense genes as opposed to the rest of the genes that were preferably lost throughout evolution. This was confirmed by evidence on independent gene acquisition that has occurred by profuse horizontal transfer. The mutation and recombination rates were calculated as a proxy of evolutionary rates. Again, genes encoding the defense systems follow different rates of evolution respect to the rest of the genes. These results lead us to conclude that the evolution of RSSC defense systems is highly dynamic and responds to a different evolutionary regime than the rest of the genes in the genomes of RSSC.
Collapse
Affiliation(s)
- José A Castillo
- School of Biological Sciences and Engineering, Yachay Tech University, San Miguel de Urcuquí, Ecuador
| | - Henry Secaira-Morocho
- School of Biological Sciences and Engineering, Yachay Tech University, San Miguel de Urcuquí, Ecuador
| | - Stephanie Maldonado
- School of Biological Sciences and Engineering, Yachay Tech University, San Miguel de Urcuquí, Ecuador
| | - Katlheen N Sarmiento
- School of Biological Sciences and Engineering, Yachay Tech University, San Miguel de Urcuquí, Ecuador
| |
Collapse
|
14
|
Kundu S, Bansal MS. SaGePhy: an improved phylogenetic simulation framework for gene and subgene evolution. Bioinformatics 2019; 35:3496-3498. [PMID: 30715213 DOI: 10.1093/bioinformatics/btz081] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 01/21/2019] [Accepted: 01/31/2019] [Indexed: 11/14/2022] Open
Abstract
SUMMARY SaGePhy is a software package for improved phylogenetic simulation of gene and subgene evolution. SaGePhy can be used to generate species trees, gene trees and subgene or (protein) domain trees using a probabilistic birth-death process that allows for gene and subgene duplication, horizontal gene and subgene transfer and gene and subgene loss. SaGePhy implements a range of important features not found in other phylogenetic simulation frameworks/software. These include (i) simulation of subgene or domain level evolution inside one or more gene trees, (ii) simultaneous simulation of both additive and replacing horizontal gene/subgene transfers and (iii) probabilistic sampling of species tree and gene tree nodes, respectively, for gene- and domain-family birth. SaGePhy is open-source, platform independent and written in Java and Python. AVAILABILITY AND IMPLEMENTATION Executables, source code (open-source under the revised BSD license) and a detailed manual are freely available from http://compbio.engr.uconn.edu/software/sagephy/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Soumya Kundu
- Department of Computer Science & Engineering, Storrs, CT, USA
| | - Mukul S Bansal
- Department of Computer Science & Engineering, Storrs, CT, USA.,The Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| |
Collapse
|
15
|
Heller D, Szklarczyk D, Mering CV. Tree reconciliation combined with subsampling improves large scale inference of orthologous group hierarchies. BMC Bioinformatics 2019; 20:228. [PMID: 31060495 PMCID: PMC6501302 DOI: 10.1186/s12859-019-2828-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 04/17/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An orthologous group (OG) comprises a set of orthologous and paralogous genes that share a last common ancestor (LCA). OGs are defined with respect to a chosen taxonomic level, which delimits the position of the LCA in time to a specified speciation event. A hierarchy of OGs expands on this notion, connecting more general OGs, distant in time, to more recent, fine-grained OGs, thereby spanning multiple levels of the tree of life. Large scale inference of OG hierarchies with independently computed taxonomic levels can suffer from inconsistencies between successive levels, such as the position in time of a duplication event. This can be due to confounding genetic signal or algorithmic limitations. Importantly, inconsistencies limit the potential use of OGs for functional annotation and third-party applications. RESULTS Here we present a new methodology to ensure hierarchical consistency of OGs across taxonomic levels. To resolve an inconsistency, we subsample the protein space of the OG members and perform gene tree-species tree reconciliation for each sampling. Differently from previous approaches, by subsampling the protein space, we avoid the notoriously difficult task of accurately building and reconciling very large phylogenies. We implement the method into a high-throughput pipeline and apply it to the eggNOG database. We use independent protein domain definitions to validate its performance. CONCLUSION The presented consistency pipeline shows that, contrary to previous limitations, tree reconciliation can be a useful instrument for the construction of OG hierarchies. The key lies in the combination of sampling smaller trees and aggregating their reconciliations for robustness. Results show comparable or greater performance to previous pipelines. The code is available on Github at: https://github.com/meringlab/og_consistency_pipeline .
Collapse
Affiliation(s)
- Davide Heller
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, Lausanne, 1015 Switzerland
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, Lausanne, 1015 Switzerland
| | - Christian von Mering
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057 Switzerland
- SIB Swiss Institute of Bioinformatics, Quartier Sorge, Batiment Genopode, Lausanne, 1015 Switzerland
| |
Collapse
|
16
|
Li L, Bansal MS. An Integrated Reconciliation Framework for Domain, Gene, and Species Level Evolution. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:63-76. [PMID: 29994126 DOI: 10.1109/tcbb.2018.2846253] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The majority of genes in eukaryotes consists of one or more protein domains that can be independently lost or gained during evolution. This gain and loss of protein domains, through domain duplications, transfers, or losses, has important evolutionary and functional consequences. Yet, even though it is well understood that domains evolve inside genes and genes inside species, there do not exist any computational frameworks to simultaneously model the evolution of domains, genes, and species and account for their inter-dependency. Here, we develop an integrated model of domain evolution that explicitly captures the interdependence of domain-, gene-, and species-level evolution. Our model extends the classical phylogenetic reconciliation framework, which infers gene family evolution by comparing gene trees and species trees, by explicitly considering domain-level evolution and decoupling domain-level events from gene-level events. In this paper, we (i) introduce the new integrated reconciliation framework, (ii) prove that the associated optimization problem is NP-hard, (iii) devise an efficient heuristic solution for the problem, (iv) apply our algorithm to a large biological dataset, and (v) demonstrate the impact of using our new computational framework compared to existing approaches. The implemented software is freely available from http://compbio.engr.uconn.edu/software/seadog/.
Collapse
|
17
|
Abstract
This chapter reviews current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this will directly impact which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multi-domain architectures. Genome evolution models that have been suggested to explain the shape of these distributions are reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly). We end by a discussion of some available tools for computational analysis or exploitation of protein domain architectures and their evolution.
Collapse
|
18
|
Duchemin W, Anselmetti Y, Patterson M, Ponty Y, Bérard S, Chauve C, Scornavacca C, Daubin V, Tannier E. DeCoSTAR: Reconstructing the Ancestral Organization of Genes or Genomes Using Reconciled Phylogenies. Genome Biol Evol 2018; 9:1312-1319. [PMID: 28402423 PMCID: PMC5441342 DOI: 10.1093/gbe/evx069] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2017] [Indexed: 12/15/2022] Open
Abstract
DeCoSTAR is a software that aims at reconstructing the organization of ancestral genes or genomes in the form of sets of neighborhood relations (adjacencies) between pairs of ancestral genes or gene domains. It can also improve the assembly of fragmented genomes by proposing evolutionary-induced adjacencies between scaffolding fragments. Ancestral genes or domains are deduced from reconciled phylogenetic trees under an evolutionary model that considers gains, losses, speciations, duplications, and transfers as possible events for gene evolution. Reconciliations are either given as input or computed with the ecceTERA package, into which DeCoSTAR is integrated. DeCoSTAR computes adjacency evolutionary scenarios using a scoring scheme based on a weighted sum of adjacency gains and breakages. Solutions, both optimal and near-optimal, are sampled according to the Boltzmann–Gibbs distribution centered around parsimonious solutions, and statistical supports on ancestral and extant adjacencies are provided. DeCoSTAR supports the features of previously contributed tools that reconstruct ancestral adjacencies, namely DeCo, DeCoLT, ART-DeCo, and DeClone. In a few minutes, DeCoSTAR can reconstruct the evolutionary history of domains inside genes, of gene fusion and fission events, or of gene order along chromosomes, for large data sets including dozens of whole genomes from all kingdoms of life. We illustrate the potential of DeCoSTAR with several applications: ancestral reconstruction of gene orders for Anopheles mosquito genomes, multidomain proteins in Drosophila, and gene fusion and fission detection in Actinobacteria. Availability:http://pbil.univ-lyon1.fr/software/DeCoSTAR (Last accessed April 24, 2017).
Collapse
Affiliation(s)
- Wandrille Duchemin
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Yoann Anselmetti
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Murray Patterson
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France.,Experimental Algorithmics Lab (AlgoLab), Dipartimento di Informatica, Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano-Bicocca, Viale Sarca, Milano, Italy
| | - Yann Ponty
- CNRS, Ecole Polytechnique, LIX UMR7161, Palaiseau, France.,Inria Saclay, EP AMIB, Palaiseau, France
| | - Sèverine Bérard
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France.,LIRMM, Université de Montpellier, CNRS, Montpellier, France
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Celine Scornavacca
- Institut des Sciences de l'Évolution, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Vincent Daubin
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Eric Tannier
- Inria Grenoble Rhône-Alpes, Montbonnot, France.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| |
Collapse
|
19
|
Liebeskind BJ, Hofmann HA, Hillis DM, Zakon HH. Evolution of Animal Neural Systems. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2017. [DOI: 10.1146/annurev-ecolsys-110316-023048] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Nervous systems are among the most spectacular products of evolution. Their provenance and evolution have been of interest and often the subjects of intense debate since the late nineteenth century. The genomics era has provided researchers with a new set of tools with which to study the early evolution of neurons, and recent progress on the molecular evolution of the first neurons has been both exciting and frustrating. It has become increasingly obvious that genomic data are often insufficient to reconstruct complex phenotypes in deep evolutionary time because too little is known about how gene function evolves over deep time. Therefore, additional functional data across the animal tree are a prerequisite to a fuller understanding of cell evolution. To this end, we review the functional modules of neurons and the evolution of their molecular components, and we introduce the idea of hierarchical molecular evolution.
Collapse
Affiliation(s)
- Benjamin J. Liebeskind
- Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, Texas 78712
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas 78712
- Center for Computational Biology and Bioinformatics, University of Texas at Austin, Austin, Texas 78712
| | - Hans A. Hofmann
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas 78712
- Center for Computational Biology and Bioinformatics, University of Texas at Austin, Austin, Texas 78712
- Department of Integrative Biology, University of Texas at Austin, Austin, Texas 78712
- Institute for Neuroscience, University of Texas at Austin, Austin, Texas 78712
| | - David M. Hillis
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas 78712
- Center for Computational Biology and Bioinformatics, University of Texas at Austin, Austin, Texas 78712
- Department of Integrative Biology, University of Texas at Austin, Austin, Texas 78712
| | - Harold H. Zakon
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas 78712
- Center for Computational Biology and Bioinformatics, University of Texas at Austin, Austin, Texas 78712
- Department of Integrative Biology, University of Texas at Austin, Austin, Texas 78712
- Department of Neuroscience, University of Texas at Austin, Austin, Texas 78712
- Institute for Neuroscience, University of Texas at Austin, Austin, Texas 78712
| |
Collapse
|