1
|
Assis R, Conant G, Holland B, Liberles DA, O'Reilly MM, Wilson AE. Models for the retention of duplicate genes and their biological underpinnings. F1000Res 2024; 12:1400. [PMID: 38173826 PMCID: PMC10762295 DOI: 10.12688/f1000research.141786.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/08/2024] [Indexed: 01/05/2024] Open
Abstract
Gene content in genomes changes through several different processes, with gene duplication being an important contributor to such changes. Gene duplication occurs over a range of scales from individual genes to whole genomes, and the dynamics of this process can be context dependent. Still, there are rules by which genes are retained or lost from genomes after duplication, and probabilistic modeling has enabled characterization of these rules, including their context-dependence. Here, we describe the biology and corresponding mathematical models that are used to understand duplicate gene retention and its contribution to the set of biochemical functions encoded in a genome.
Collapse
Affiliation(s)
- Raquel Assis
- Florida Atlantic University, Boca Raton, Florida, USA
| | - Gavin Conant
- North Carolina State University, Raleigh, North Carolina, USA
| | | | | | | | | |
Collapse
|
2
|
Henry CN, Piper K, Wilson AE, Miraszek JL, Probst CS, Rong Y, Liberles DA. WGDTree: a phylogenetic software tool to examine conditional probabilities of retention following whole genome duplication events. BMC Bioinformatics 2022; 23:505. [PMID: 36434497 PMCID: PMC9701042 DOI: 10.1186/s12859-022-05042-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 11/08/2022] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Multiple processes impact the probability of retention of individual genes following whole genome duplication (WGD) events. In analyzing two consecutive whole genome duplication events that occurred in the lineage leading to Atlantic salmon, a new phylogenetic statistical analysis was developed to examine the contingency of retention in one event based upon retention in a previous event. This analysis is intended to evaluate mechanisms of duplicate gene retention and to provide software to generate the test statistic for any genome with pairs of WGDs in its history. RESULTS Here a software package written in Python, 'WGDTree' for the analysis of duplicate gene retention following whole genome duplication events is presented. Using gene tree-species tree reconciliation to label gene duplicate nodes and differentiate between WGD and SSD duplicates, the tool calculates a statistic based upon the conditional probability of a gene duplicate being retained after a second whole genome duplication dependent upon the retention status after the first event. The package also contains methods for the simulation of gene trees with WGD events. After running simulations, the accuracy of the placement of events has been determined to be high. The conditional probability statistic has been calculated for Phalaenopsis equestris on a monocot species tree with a pair of consecutive WGD events on its lineage, showing the applicability of the method. CONCLUSIONS A new software tool has been created for the analysis of duplicate genes in examination of retention mechanisms. The software tool has been made available on the Python package index and the source code can be found on GitHub here: https://github.com/cnickh/wgdtree .
Collapse
Affiliation(s)
- C. Nicholas Henry
- grid.264727.20000 0001 2248 3398Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA 19122 USA
| | - Kathryn Piper
- grid.264727.20000 0001 2248 3398Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA 19122 USA ,grid.265850.c0000 0001 2151 7947Present Address: Department of Biological Sciences, University at Albany, Albany, NY 12222 USA
| | - Amanda E. Wilson
- grid.264727.20000 0001 2248 3398Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA 19122 USA
| | - John L. Miraszek
- grid.264727.20000 0001 2248 3398Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA 19122 USA ,grid.134936.a0000 0001 2162 3504Present Address: Genetics Area Program, University of Missouri, Columbia, MO 65211 USA
| | - Claire S. Probst
- grid.264727.20000 0001 2248 3398Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA 19122 USA
| | - Yuying Rong
- grid.264727.20000 0001 2248 3398Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA 19122 USA ,grid.256868.70000 0001 2215 7365Department of Biology, Haverford College, Haverford, PA 19041 USA ,grid.4830.f0000 0004 0407 1981Present Address: Groningen Institute for Evolutionary Life Sciences, University of Groningen, 9747 AG Groningen, The Netherlands
| | - David A. Liberles
- grid.264727.20000 0001 2248 3398Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA 19122 USA
| |
Collapse
|
3
|
Counting and sampling gene family evolutionary histories in the duplication-loss and duplication-loss-transfer models. J Math Biol 2020; 80:1353-1388. [PMID: 32060618 PMCID: PMC7052048 DOI: 10.1007/s00285-019-01465-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 11/18/2019] [Indexed: 10/28/2022]
Abstract
Given a set of species whose evolution is represented by a species tree, a gene family is a group of genes having evolved from a single ancestral gene. A gene family evolves along the branches of a species tree through various mechanisms, including-but not limited to-speciation ([Formula: see text]), gene duplication ([Formula: see text]), gene loss ([Formula: see text]), and horizontal gene transfer ([Formula: see text]). The reconstruction of a gene tree representing the evolution of a gene family constrained by a species tree is an important problem in phylogenomics. However, unlike in the multispecies coalescent evolutionary model that considers only speciation and incomplete lineage sorting events, very little is known about the search space for gene family histories accounting for gene duplication, gene loss and horizontal gene transfer (the [Formula: see text]-model). In this work, we introduce the notion of evolutionary histories defined as a binary ordered rooted tree describing the evolution of a gene family, constrained by a species tree in the [Formula: see text]-model. We provide formal grammars describing the set of all evolutionary histories that are compatible with a given species tree, whether it is ranked or unranked. These grammars allow us, using either analytic combinatorics or dynamic programming, to efficiently compute the number of histories of a given size, and also to generate random histories of a given size under the uniform distribution. We apply these tools to obtain exact asymptotics for the number of gene family histories for two species trees, the rooted caterpillar and complete binary tree, as well as estimates of the range of the exponential growth factor of the number of histories for random species trees of size up to 25. Our results show that including horizontal gene transfers induce a dramatic increase of the number of evolutionary histories. We also show that, within ranked species trees, the number of evolutionary histories in the [Formula: see text]-model is almost independent of the species tree topology. These results establish firm foundations for the development of ensemble methods for the prediction of reconciliations.
Collapse
|
4
|
Duchemin W, Gence G, Arigon Chifolleau AM, Arvestad L, Bansal MS, Berry V, Boussau B, Chevenet F, Comte N, Davín AA, Dessimoz C, Dylus D, Hasic D, Mallo D, Planel R, Posada D, Scornavacca C, Szöllosi G, Zhang L, Tannier É, Daubin V. RecPhyloXML: a format for reconciled gene trees. Bioinformatics 2019; 34:3646-3652. [PMID: 29762653 PMCID: PMC6198865 DOI: 10.1093/bioinformatics/bty389] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 05/09/2018] [Indexed: 12/21/2022] Open
Abstract
Motivation A reconciliation is an annotation of the nodes of a gene tree with evolutionary events—for example, speciation, gene duplication, transfer, loss, etc.—along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative—albeit flexible—specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation http://phylariane.univ-lyon1.fr/recphyloxml/.
Collapse
Affiliation(s)
- Wandrille Duchemin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Guillaume Gence
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - Anne-Muriel Arigon Chifolleau
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France
| | - Lars Arvestad
- Department of Mathematics, Stockholm University, Stockholm, Sweden.,Swedish e-Science Research Centre (SeRC), Stockholm, Sweden
| | - Mukul S Bansal
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.,Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - Vincent Berry
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Bastien Boussau
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| | - François Chevenet
- LIRMM, Université de Montpellier, CNRS, Montpellier, France.,MIVEGEC, CNRS 5290, IRD 224, Université de Montpellier, Montpellier, France
| | - Nicolas Comte
- INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Adrián A Davín
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution and Environment, University College London, London, UK.,Department of Computer Science, University College London, London, UK.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - David Dylus
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Damir Hasic
- Department of Mathematics, Faculty of Science, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Diego Mallo
- Virginia G. Piper Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rémi Planel
- Laboratoire d'Analyse Bio-informatique en Génomique et Métabolisme CNRS-UMR 8030, Commissariat à l'Énergie Atomique (CEA), Institut de Génomique, Genoscope, Evry, France
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Celine Scornavacca
- Institut de Biologie Computationnelle (IBC), Montpellier, France.,ISEM, CNRS, Université de Montpellier, IRD, EPHE, Montpellier, France
| | - Gergely Szöllosi
- MTA-ELTE Lendület Evolutionary Genomics Research Group, Budapest, Hungary.,Department of Biological Physics, Eötvös Loránd University, Budapest, Hungary
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, Singapore, Singapore
| | - Éric Tannier
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France.,INRIA Grenoble Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
5
|
Gamble T. Duplications in Corneous Beta Protein Genes and the Evolution of Gecko Adhesion. Integr Comp Biol 2019; 59:193-202. [DOI: 10.1093/icb/icz010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Abstract
Corneous proteins are an important component of the tetrapod integument. Duplication and diversification of keratins and associated proteins are linked with the origin of most novel integumentary structures like mammalian hair, avian feathers, and scutes covering turtle shells. Accordingly, the loss of integumentary structures often coincides with the loss of genes encoding keratin and associated proteins. For example, many hair keratins in dolphins and whales have become pseudogenes. The adhesive setae of geckos and anoles are composed of both intermediate filament keratins (IF-keratins, formerly known as alpha-keratins) and corneous beta-proteins (CBPs, formerly known as beta-keratins) and recent whole genome assemblies of two gecko species and an anole uncovered duplications in seta-specific CBPs in each of these lineages. While anoles evolved adhesive toepads just once, there are two competing hypotheses about the origin(s) of digital adhesion in geckos involving either a single origin or multiple origins. Using data from three published gecko genomes, I examine CBP gene evolution in geckos and find support for a hypothesis where CBP gene duplications are associated with the repeated evolution of digital adhesion. Although these results are preliminary, I discuss how additional gecko genome assemblies, combined with phylogenies of keratin and associated protein genes and gene duplication models, can provide rigorous tests of several hypotheses related to gecko CBP evolution. This includes a taxon sampling strategy for sequencing and assembly of gecko genomes that could help resolve competing hypotheses surrounding the origin(s) of digital adhesion.
Collapse
Affiliation(s)
- Tony Gamble
- Department of Biological Sciences, Marquette University, Milwaukee, WI 53201, USA
- Bell Museum of Natural History, University of Minnesota, Saint Paul, MN 55113, USA
- Milwaukee Public Museum, Milwaukee, WI 53233, USA
| |
Collapse
|
6
|
Abstract
The correct interpretation of any phylogenetic tree is dependent on that tree being correctly rooted. We present STRIDE, a fast, effective, and outgroup-free method for identification of gene duplication events and species tree root inference in large-scale molecular phylogenetic analyses. STRIDE identifies sets of well-supported in-group gene duplication events from a set of unrooted gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of its root. We show that STRIDE correctly identifies the root of the species tree in multiple large-scale molecular phylogenetic data sets spanning a wide range of timescales and taxonomic groups. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for data sets where information is limited. Furthermore, application of STRIDE to outgroup-free inference of the origin of the eukaryotic tree resulted in a root probability distribution that provides additional support for leading hypotheses for the origin of the eukaryotes.
Collapse
Affiliation(s)
- David M Emms
- Department of Plant Sciences, University of Oxford, Oxford, United Kingdom
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
7
|
Affiliation(s)
- Casey W. Dunn
- Department of Ecology and Evolutionary Biology Brown University 80 Waterman St Providence RIUSA
| | - Catriona Munro
- Department of Ecology and Evolutionary Biology Brown University 80 Waterman St Providence RIUSA
| |
Collapse
|
8
|
Abstract
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Eric Tannier
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Bastien Boussau
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;
| |
Collapse
|