1
|
Gàlvez-Morante A, Guéguen L, Natsidis P, Telford MJ, Richter DJ. Dollo Parsimony Overestimates Ancestral Gene Content Reconstructions. Genome Biol Evol 2024; 16:evae062. [PMID: 38518756 PMCID: PMC10995720 DOI: 10.1093/gbe/evae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 03/15/2024] [Accepted: 03/19/2024] [Indexed: 03/24/2024] Open
Abstract
Ancestral reconstruction is a widely used technique that has been applied to understand the evolutionary history of gain and loss of gene families. Ancestral gene content can be reconstructed via different phylogenetic methods, but many current and previous studies employ Dollo parsimony. We hypothesize that Dollo parsimony is not appropriate for ancestral gene content reconstruction inferences based on sequence homology, as Dollo parsimony is derived from the assumption that a complex character cannot be regained. This premise does not accurately model molecular sequence evolution, in which false orthology can result from sequence convergence or lateral gene transfer. The aim of this study is to test Dollo parsimony's suitability for ancestral gene content reconstruction and to compare its inferences with a maximum likelihood-based approach that allows a gene family to be gained more than once within a tree. We first compared the performance of the two approaches on a series of artificial data sets each of 5,000 genes that were simulated according to a spectrum of evolutionary rates without gene gain or loss, so that inferred deviations from the true gene count would arise only from errors in orthology inference and ancestral reconstruction. Next, we reconstructed protein domain evolution on a phylogeny representing known eukaryotic diversity. We observed that Dollo parsimony produced numerous ancestral gene content overestimations, especially at nodes closer to the root of the tree. These observations led us to the conclusion that, confirming our hypothesis, Dollo parsimony is not an appropriate method for ancestral reconstruction studies based on sequence homology.
Collapse
Affiliation(s)
- Alex Gàlvez-Morante
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona 08003, Spain
| | - Laurent Guéguen
- LBBE, UMR 5558, CNRS, Université Claude Bernard Lyon 1, Villeurbanne 69622, France
| | - Paschalis Natsidis
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Maximilian J Telford
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Daniel J Richter
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona 08003, Spain
| |
Collapse
|
2
|
Barrera-Redondo J, Lotharukpong JS, Drost HG, Coelho SM. Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra. Genome Biol 2023; 24:54. [PMID: 36964572 PMCID: PMC10037820 DOI: 10.1186/s13059-023-02895-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 03/10/2023] [Indexed: 03/26/2023] Open
Abstract
We present GenEra ( https://github.com/josuebarrera/GenEra ), a DIAMOND-fueled gene-family founder inference framework that addresses previously raised limitations and biases in genomic phylostratigraphy, such as homology detection failure. GenEra also reduces computational time from several months to a few days for any genome of interest. We analyze the emergence of taxonomically restricted gene families during major evolutionary transitions in plants, animals, and fungi. Our results indicate that the impact of homology detection failure on inferred patterns of gene emergence is lineage-dependent, suggesting that plants are more prone to evolve novelty through the emergence of new genes compared to animals and fungi.
Collapse
Affiliation(s)
- Josué Barrera-Redondo
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| | - Jaruwatana Sodai Lotharukpong
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany
| | - Hajk-Georg Drost
- Computational Biology Group, Department of Molecular Biology, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| | - Susana M Coelho
- Department of Algal Development and Evolution, Max Planck Institute for Biology, Max-Planck-Ring 5, 72076, Tübingen, Germany.
| |
Collapse
|
3
|
Jiang M, Li X, Dong X, Zu Y, Zhan Z, Piao Z, Lang H. Research Advances and Prospects of Orphan Genes in Plants. FRONTIERS IN PLANT SCIENCE 2022; 13:947129. [PMID: 35874010 PMCID: PMC9305701 DOI: 10.3389/fpls.2022.947129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 06/23/2022] [Indexed: 06/15/2023]
Abstract
Orphan genes (OGs) are defined as genes having no sequence similarity with genes present in other lineages. OGs have been regarded to play a key role in the development of lineage-specific adaptations and can also serve as a constant source of evolutionary novelty. These genes have often been found related to various stress responses, species-specific traits, special expression regulation, and also participate in primary substance metabolism. The advancement in sequencing tools and genome analysis methods has made the identification and characterization of OGs comparatively easier. In the study of OG functions in plants, significant progress has been made. We review recent advances in the fast evolving characteristics, expression modulation, and functional analysis of OGs with a focus on their role in plant biology. We also emphasize current challenges, adoptable strategies and discuss possible future directions of functional study of OGs.
Collapse
Affiliation(s)
- Mingliang Jiang
- School of Agriculture, Jilin Agricultural Science and Technology College, Jilin, China
| | - Xiaonan Li
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Xiangshu Dong
- School of Agriculture, Yunnan University, Kunming, China
| | - Ye Zu
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Zongxiang Zhan
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Zhongyun Piao
- College of Horticulture, Shenyang Agricultural University, Shenyang, China
| | - Hong Lang
- School of Agriculture, Jilin Agricultural Science and Technology College, Jilin, China
| |
Collapse
|
4
|
Blevins WR. Identification of Taxonomically Restricted Transcripts from Illumina RNA Sequencing Data. Methods Mol Biol 2022; 2477:91-103. [PMID: 35524114 DOI: 10.1007/978-1-0716-2257-5_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In order to perform a well-balanced comparative transcriptomic analysis, the reference genome and annotations for all species included in the comparison must be of a similar quality and completeness. Frequently, comparative transcriptomic analyses include non-model organisms whose annotations are not as well curated; this inequality can lead to biases.To avoid potential biases stemming from incomplete annotations, a comparative transcriptomic analysis can incorporate de novo transcriptome assemblies for each species, which reduces this disparity. This chapter covers all of the steps which are necessary to run a comparative transcriptomic analysis with de novo transcriptome assemblies, from the first step of the experimental design to the sequencing, and ultimately the bioinformatic analysis.
Collapse
Affiliation(s)
- William R Blevins
- Single Cell Genomics Group, Centro Nacional de Análisis Genómico (CNAG), Barcelona, Spain.
| |
Collapse
|
5
|
Rivard EL, Ludwig AG, Patel PH, Grandchamp A, Arnold SE, Berger A, Scott EM, Kelly BJ, Mascha GC, Bornberg-Bauer E, Findlay GD. A putative de novo evolved gene required for spermatid chromatin condensation in Drosophila melanogaster. PLoS Genet 2021; 17:e1009787. [PMID: 34478447 PMCID: PMC8445463 DOI: 10.1371/journal.pgen.1009787] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/16/2021] [Accepted: 08/19/2021] [Indexed: 02/07/2023] Open
Abstract
Comparative genomics has enabled the identification of genes that potentially evolved de novo from non-coding sequences. Many such genes are expressed in male reproductive tissues, but their functions remain poorly understood. To address this, we conducted a functional genetic screen of over 40 putative de novo genes with testis-enriched expression in Drosophila melanogaster and identified one gene, atlas, required for male fertility. Detailed genetic and cytological analyses showed that atlas is required for proper chromatin condensation during the final stages of spermatogenesis. Atlas protein is expressed in spermatid nuclei and facilitates the transition from histone- to protamine-based chromatin packaging. Complementary evolutionary analyses revealed the complex evolutionary history of atlas. The protein-coding portion of the gene likely arose at the base of the Drosophila genus on the X chromosome but was unlikely to be essential, as it was then lost in several independent lineages. Within the last ~15 million years, however, the gene moved to an autosome, where it fused with a conserved non-coding RNA and evolved a non-redundant role in male fertility. Altogether, this study provides insight into the integration of novel genes into biological processes, the links between genomic innovation and functional evolution, and the genetic control of a fundamental developmental process, gametogenesis.
Collapse
Affiliation(s)
- Emily L. Rivard
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Andrew G. Ludwig
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Prajal H. Patel
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | | | - Sarah E. Arnold
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | | | - Emilie M. Scott
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Brendan J. Kelly
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Grace C. Mascha
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Geoffrey D. Findlay
- College of the Holy Cross, Worcester, Massachusetts, United States of America
| |
Collapse
|
6
|
Lineweaver CH, Bussey KJ, Blackburn AC, Davies PCW. Cancer progression as a sequence of atavistic reversions. Bioessays 2021; 43:e2000305. [PMID: 33984158 DOI: 10.1002/bies.202000305] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 03/26/2021] [Accepted: 03/29/2021] [Indexed: 12/27/2022]
Abstract
It has long been recognized that cancer onset and progression represent a type of reversion to an ancestral quasi-unicellular phenotype. This general concept has been refined into the atavistic model of cancer that attempts to provide a quantitative analysis and testable predictions based on genomic data. Over the past decade, support for the multicellular-to-unicellular reversion predicted by the atavism model has come from phylostratigraphy. Here, we propose that cancer onset and progression involve more than a one-off multicellular-to-unicellular reversion, and are better described as a series of reversionary transitions. We make new predictions based on the chronology of the unicellular-eukaryote-to-multicellular-eukaryote transition. We also make new predictions based on three other evolutionary transitions that occurred in our lineage: eukaryogenesis, oxidative phosphorylation and the transition to adaptive immunity. We propose several modifications to current phylostratigraphy to improve age resolution to test these predictions. Also see the video abstract here: https://youtu.be/3unEu5JYJrQ.
Collapse
Affiliation(s)
- Charles H Lineweaver
- Planetary Science Institute, Research School of Astronomy and Astrophysics & Research School of Earth Sciences, The Australian National University, Canberra, ACT, Australia.,Mt Stromlo Observatory, Canberra, ACT, Australia
| | - Kimberly J Bussey
- Beyond Center for Fundamental Concepts in Science, Arizona State University, Tempe, Arizona, USA.,Precision Medicine, Midwestern University, Glendale, Arizona, USA
| | - Anneke C Blackburn
- The John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia
| | - Paul C W Davies
- Beyond Center for Fundamental Concepts in Science, Arizona State University, Tempe, Arizona, USA
| |
Collapse
|
7
|
Structure and function of naturally evolved de novo proteins. Curr Opin Struct Biol 2021; 68:175-183. [PMID: 33567396 DOI: 10.1016/j.sbi.2020.11.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/16/2020] [Accepted: 11/27/2020] [Indexed: 01/05/2023]
Abstract
Comparative evolutionary genomics has revealed that novel protein coding genes can emerge randomly from non-coding DNA. While most of the myriad of transcripts which continuously emerge vanish rapidly, some attain regulatory regions, become translated and survive. More surprisingly, sequence properties of de novo proteins are almost indistinguishable from randomly obtained sequences, yet de novo proteins may gain functions and integrate into eukaryotic cellular networks quite easily. We here discuss current knowledge on de novo proteins, their structures, functions and evolution. Since the existence of de novo proteins seems at odds with decade-long attempts to construct proteins with novel structures and functions from scratch, we suggest that a better understanding of de novo protein evolution may fuel new strategies for protein design.
Collapse
|
8
|
James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J. Universal and taxon-specific trends in protein sequences as a function of age. eLife 2021; 10:e57347. [PMID: 33416492 PMCID: PMC7819706 DOI: 10.7554/elife.57347] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 01/05/2021] [Indexed: 01/12/2023] Open
Abstract
Extant protein-coding sequences span a huge range of ages, from those that emerged only recently to those present in the last universal common ancestor. Because evolution has had less time to act on young sequences, there might be 'phylostratigraphy' trends in any properties that evolve slowly with age. A long-term reduction in hydrophobicity and hydrophobic clustering was found in previous, taxonomically restricted studies. Here we perform integrated phylostratigraphy across 435 fully sequenced species, using sensitive HMM methods to detect protein domain homology. We find that the reduction in hydrophobic clustering is universal across lineages. However, only young animal domains have a tendency to have higher structural disorder. Among ancient domains, trends in amino acid composition reflect the order of recruitment into the genetic code, suggesting that the composition of the contemporary descendants of ancient sequences reflects amino acid availability during the earliest stages of life, when these sequences first emerged.
Collapse
Affiliation(s)
- Jennifer E James
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Sara M Willis
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Paul G Nelson
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Catherine Weibel
- Department of Physics, University of ArizonaTucsonUnited States
- Department of Mathematics, University of ArizonaTucsonUnited States
| | - Luke J Kosinski
- Department of Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| |
Collapse
|
9
|
Futo M, Opašić L, Koska S, Čorak N, Široki T, Ravikumar V, Thorsell A, Lenuzzi M, Kifer D, Domazet-Lošo M, Vlahoviček K, Mijakovic I, Domazet-Lošo T. Embryo-Like Features in Developing Bacillus subtilis Biofilms. Mol Biol Evol 2021; 38:31-47. [PMID: 32871001 PMCID: PMC7783165 DOI: 10.1093/molbev/msaa217] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Correspondence between evolution and development has been discussed for more than two centuries. Recent work reveals that phylogeny-ontogeny correlations are indeed present in developmental transcriptomes of eukaryotic clades with complex multicellularity. Nevertheless, it has been largely ignored that the pervasive presence of phylogeny-ontogeny correlations is a hallmark of development in eukaryotes. This perspective opens a possibility to look for similar parallelisms in biological settings where developmental logic and multicellular complexity are more obscure. For instance, it has been increasingly recognized that multicellular behavior underlies biofilm formation in bacteria. However, it remains unclear whether bacterial biofilm growth shares some basic principles with development in complex eukaryotes. Here we show that the ontogeny of growing Bacillus subtilis biofilms recapitulates phylogeny at the expression level. Using time-resolved transcriptome and proteome profiles, we found that biofilm ontogeny correlates with the evolutionary measures, in a way that evolutionary younger and more diverged genes were increasingly expressed toward later timepoints of biofilm growth. Molecular and morphological signatures also revealed that biofilm growth is highly regulated and organized into discrete ontogenetic stages, analogous to those of eukaryotic embryos. Together, this suggests that biofilm formation in Bacillus is a bona fide developmental process comparable to organismal development in animals, plants, and fungi. Given that most cells on Earth reside in the form of biofilms and that biofilms represent the oldest known fossils, we anticipate that the widely adopted vision of the first life as a single-cell and free-living organism needs rethinking.
Collapse
Affiliation(s)
- Momir Futo
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
| | - Luka Opašić
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
- Department for Evolutionary Theory, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Sara Koska
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
| | - Nina Čorak
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
| | - Tin Široki
- Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
| | - Vaishnavi Ravikumar
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Annika Thorsell
- Proteomics Core Facility, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Maša Lenuzzi
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Domagoj Kifer
- Faculty of Pharmacy and Biochemistry, University of Zagreb, Zagreb, Croatia
| | - Mirjana Domazet-Lošo
- Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
| | - Kristian Vlahoviček
- Bioinformatics Group, Division of Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia
- School of Biosciences, University of Skövde, Skövde, Sweden
| | - Ivan Mijakovic
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark
- Systems and Synthetic Biology Division, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Tomislav Domazet-Lošo
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia
- Catholic University of Croatia, Zagreb, Croatia
| |
Collapse
|
10
|
Dowling D, Schmitz JF, Bornberg-Bauer E. Stochastic Gain and Loss of Novel Transcribed Open Reading Frames in the Human Lineage. Genome Biol Evol 2020; 12:2183-2195. [PMID: 33210146 PMCID: PMC7674706 DOI: 10.1093/gbe/evaa194] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2020] [Indexed: 12/12/2022] Open
Abstract
In addition to known genes, much of the human genome is transcribed into RNA. Chance formation of novel open reading frames (ORFs) can lead to the translation of myriad new proteins. Some of these ORFs may yield advantageous adaptive de novo proteins. However, widespread translation of noncoding DNA can also produce hazardous protein molecules, which can misfold and/or form toxic aggregates. The dynamics of how de novo proteins emerge from potentially toxic raw materials and what influences their long-term survival are unknown. Here, using transcriptomic data from human and five other primates, we generate a set of transcribed human ORFs at six conservation levels to investigate which properties influence the early emergence and long-term retention of these expressed ORFs. As these taxa diverged from each other relatively recently, we present a fine scale view of the evolution of novel sequences over recent evolutionary time. We find that novel human-restricted ORFs are preferentially located on GC-rich gene-dense chromosomes, suggesting their retention is linked to pre-existing genes. Sequence properties such as intrinsic structural disorder and aggregation propensity-which have been proposed to play a role in survival of de novo genes-remain unchanged over time. Even very young sequences code for proteins with low aggregation propensities, suggesting that genomic regions with many novel transcribed ORFs are concomitantly less likely to produce ORFs which code for harmful toxic proteins. Our data indicate that the survival of these novel ORFs is largely stochastic rather than shaped by selection.
Collapse
Affiliation(s)
- Daniel Dowling
- Institute for Evolution and Biodiversity, University of Münster, Germany
| | - Jonathan F Schmitz
- Institute for Evolution and Biodiversity, University of Münster, Germany
| | | |
Collapse
|
11
|
Weisman CM, Murray AW, Eddy SR. Many, but not all, lineage-specific genes can be explained by homology detection failure. PLoS Biol 2020; 18:e3000862. [PMID: 33137085 PMCID: PMC7660931 DOI: 10.1371/journal.pbio.3000862] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 11/12/2020] [Accepted: 09/21/2020] [Indexed: 12/21/2022] Open
Abstract
Genes for which homologs can be detected only in a limited group of evolutionarily related species, called “lineage-specific genes,” are pervasive: Essentially every lineage has them, and they often comprise a sizable fraction of the group’s total genes. Lineage-specific genes are often interpreted as “novel” genes, representing genetic novelty born anew within that lineage. Here, we develop a simple method to test an alternative null hypothesis: that lineage-specific genes do have homologs outside of the lineage that, even while evolving at a constant rate in a novelty-free manner, have merely become undetectable by search algorithms used to infer homology. We show that this null hypothesis is sufficient to explain the lack of detected homologs of a large number of lineage-specific genes in fungi and insects. However, we also find that a minority of lineage-specific genes in both clades are not well explained by this novelty-free model. The method provides a simple way of identifying which lineage-specific genes call for special explanations beyond homology detection failure, highlighting them as interesting candidates for further study. Lineage-specific gene families may arise from evolutionary innovations such as de novo gene origination, or may simply mean that a similarity search program failed to identify more distant homologs. A new computational method for modeling the expected decay of similarity search scores with evolutionary distance allows distinction between the two explanations.
Collapse
Affiliation(s)
- Caroline M. Weisman
- Department of Molecular & Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Andrew W. Murray
- Department of Molecular & Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sean R. Eddy
- Department of Molecular & Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- Howard Hughes Medical Institute, Harvard University, Cambridge, Massachusetts, United States of America
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
12
|
Arendsee Z, Li J, Singh U, Seetharam A, Dorman K, Wurtele ES. phylostratr: a framework for phylostratigraphy. Bioinformatics 2020; 35:3617-3627. [PMID: 30873536 DOI: 10.1093/bioinformatics/btz171] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 02/27/2019] [Accepted: 03/13/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION The goal of phylostratigraphy is to infer the evolutionary origin of each gene in an organism. This is done by searching for homologs within increasingly broad clades. The deepest clade that contains a homolog of the protein(s) encoded by a gene is that gene's phylostratum. RESULTS We have created a general R-based framework, phylostratr, to estimate the phylostratum of every gene in a species. The program fully automates analysis: selecting species for balanced representation, retrieving sequences, building databases, inferring phylostrata and returning diagnostics. Key diagnostics include: detection of genes with inferred homologs in old clades, but not intermediate ones; proteome quality assessments; false-positive diagnostics, and checks for missing organellar genomes. phylostratr allows extensive customization and systematic comparisons of the influence of analysis parameters or genomes on phylostrata inference. A user may: modify the automatically generated clade tree or use their own tree; provide custom sequences in place of those automatically retrieved from UniProt; replace BLAST with an alternative algorithm; or tailor the method and sensitivity of the homology inference classifier. We show the utility of phylostratr through case studies in Arabidopsis thaliana and Saccharomyces cerevisiae. AVAILABILITY AND IMPLEMENTATION Source code available at https://github.com/arendsee/phylostratr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zebulun Arendsee
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.,Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA, USA
| | - Jing Li
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.,Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA
| | - Urminder Singh
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.,Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA
| | - Arun Seetharam
- Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.,Genome Informatics Facility, Iowa State University, Ames, IA, USA
| | - Karin Dorman
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.,Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.,Department of Statistics, Iowa State University, Ames, IA, USA
| | - Eve Syrkin Wurtele
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.,Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA.,Center for Metabolic Biology, Iowa State University, Ames, IA, USA
| |
Collapse
|
13
|
Fernández R, Gabaldón T. Gene gain and loss across the metazoan tree of life. Nat Ecol Evol 2020; 4:524-533. [PMID: 31988444 PMCID: PMC7124887 DOI: 10.1038/s41559-019-1069-x] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 11/21/2019] [Indexed: 12/22/2022]
Abstract
Although recent research has revealed high genomic complexity in the earliest-splitting animals and their ancestors, the macroevolutionary trends orchestrating gene repertoire evolution throughout the animal phyla remain poorly understood. We used a phylogenomic approach to interrogate genome evolution across all animal phyla. Our analysis uncovered a bimodal distribution of recruitment of orthologous genes, with most genes gained very 'early' (that is, at deep nodes) or very 'late', representing lineage-specific acquisitions. The emergence of animals was characterized by high values of gene birth and duplications. Deuterostomes, ecdysozoans and Xenacoelomorpha were characterized by no gene gain but rampant differential gene loss. Genes considered as animal hallmarks, such as Notch/Delta, were convergently duplicated in all phyla and at different evolutionary depths. Genes duplicated in all nodes from Metazoa to phylum-specific levels were enriched in functions related to the neural system, suggesting that this system has been continuously and independently reshaped throughout evolution across animals. Our results indicate that animal genomes evolved by unparalleled gene duplication followed by differential gene loss, and provide an atlas of gene repertoire evolution throughout the animal tree of life to navigate how, when and how often each gene in each genome was gained, duplicated or lost.
Collapse
Affiliation(s)
- Rosa Fernández
- Bioinformatics and Genomics Unit, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
- Barcelona Supercomputing Centre (BSC-CNS) and Institute for Research in Biomedicine (IRB), Barcelona, Spain
| | - Toni Gabaldón
- Bioinformatics and Genomics Unit, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
- Barcelona Supercomputing Centre (BSC-CNS) and Institute for Research in Biomedicine (IRB), Barcelona, Spain.
| |
Collapse
|
14
|
Wu L, Ferger KE, Lambert JD. Gene Expression Does Not Support the Developmental Hourglass Model in Three Animals with Spiralian Development. Mol Biol Evol 2019; 36:1373-1383. [DOI: 10.1093/molbev/msz065] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Abstract
It has been proposed that animals have a pattern of developmental evolution resembling an hourglass because the most conserved development stage—often called the phylotypic stage—is always in midembryonic development. Although the topic has been debated for decades, recent studies using molecular data such as RNA-seq gene expression data sets have largely supported the existence of periods of relative evolutionary conservation in middevelopment, consistent with the phylotypic stage and the hourglass concepts. However, so far this approach has only been applied to a limited number of taxa across the tree of life. Here, using established phylotranscriptomic approaches, we found a surprising reverse hourglass pattern in two molluscs and a polychaete annelid, representatives of the Spiralia, an understudied group that contains a large fraction of metazoan body plan diversity. These results suggest that spiralians have a divergent midembryonic stage, with more conserved early and late development, which is the inverse of the pattern seen in almost all other organisms where these phylotranscriptomic approaches have been reported. We discuss our findings in light of proposed reasons for the phylotypic stage and hourglass model in other systems.
Collapse
Affiliation(s)
- Longjun Wu
- Department of Biology, University of Rochester, Rochester, NY
| | - Kailey E Ferger
- Department of Biology, University of Rochester, Rochester, NY
| | - J David Lambert
- Department of Biology, University of Rochester, Rochester, NY
| |
Collapse
|
15
|
Zhang L, Tan Y, Fan S, Zhang X, Zhang Z. Phylostratigraphic analysis of gene co-expression network reveals the evolution of functional modules for ovarian cancer. Sci Rep 2019; 9:2623. [PMID: 30796309 PMCID: PMC6384884 DOI: 10.1038/s41598-019-40023-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 01/23/2019] [Indexed: 01/06/2023] Open
Abstract
Ovarian cancer (OV) is an extremely lethal disease. However, the evolutionary machineries of OV are still largely unknown. Here, we used a method that combines phylostratigraphy information with gene co-expression networks to extensively study the evolutionary compositions of OV. The present co-expression network construction yielded 18,549 nodes and 114,985 edges based on 307 OV expression samples obtained from the Genome Data Analysis Centers database. A total of 20 modules were identified as OV related clusters. The human genome sequences were divided into 19 phylostrata (PS), the majority (67.45%) of OV genes was already present in the eukaryotic ancestor. There were two strong peaks of the emergence of OV genes screened by hypergeometric test: the evolution of the multicellular metazoan organisms (PS5 and PS6, P value = 0.002) and the emergence of bony fish (PS11 and PS12, P value = 0.009). Hence, the origin of OV is far earlier than its emergence. The integrated analysis of the topology of OV modules and the phylogenetic data revealed an evolutionary pattern of OV in human, namely, OV modules have arisen step by step during the evolution of the respective lineages. New genes have evolved and become locked into a pathway, where more and more biological pathways are fixed into OV modules by recruiting new genes during human evolution.
Collapse
Affiliation(s)
- Luoyan Zhang
- Key Lab of Plant Stress Research, College of Life Science, Shandong Normal University, Jinan, 250014, Shandong, China
| | - Yi Tan
- Qilu Cell Therapy Technology Co., Ltd, Jinan, 250000, Shandong, China
| | - Shoujin Fan
- Key Lab of Plant Stress Research, College of Life Science, Shandong Normal University, Jinan, 250014, Shandong, China
| | - Xuejie Zhang
- Key Lab of Plant Stress Research, College of Life Science, Shandong Normal University, Jinan, 250014, Shandong, China
| | - Zhen Zhang
- Laboratory for Molecular Immunology, Institute of Basic Medicine, Shandong Academy of Medical Sciences, Jinan, 250062, Shandong, China.
| |
Collapse
|