1
|
Islam M, Behura SK. Role of paralogs in the sex-bias transcriptional and metabolic regulation of the brain-placental axis in mice. Placenta 2024; 145:143-150. [PMID: 38134547 DOI: 10.1016/j.placenta.2023.12.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 12/12/2023] [Accepted: 12/14/2023] [Indexed: 12/24/2023]
Abstract
INTRODUCTION Duplicated genes or paralogs play important roles in the adaptive function of eukaryotic genomes. Animal studies have shown evidence for the functional role of paralogs in pregnancy, but our knowledge about the role of paralogs in the fetoplacental regulation remains limited. In particular, if fetoplacental metabolic regulation is modulated by differential expression of paralogs remains unexamined. METHODS In this study, gene expression profiles of day-15 placenta and fetal brain were compared to identify families or groups of paralogous genes expressed in the placenta and brain of male versus female fetuses in mice. A Bayesian modeling was applied to infer directional relationship of transcriptional variation of the paralogs relative to the phylogenetic variation of the genes in each family. Gas chromatography-mass spectrometry (GC-MS) was used to perform untargeted metabolomics analysis of day-15 placenta and fetal brain of both sexes. RESULTS We identified paralog groups that were expressed in a sex and/or tissue biased manner between the placenta and fetal brain. Bayesian modeling showed evidence for directional relationship between expression and phylogeny of specific paralogs. These relationships were sex specific. GC-MS analysis identified metabolites that were expressed in a sex-bias manner between the placenta and fetal brain. By performing integrative analysis of the metabolomics and gene expression data, we showed that specific groups of metabolites and paralogous genes were expressed in a coordinated manner between the placenta and fetal brain. DISCUSSION The findings of this study collectively suggest that paralogs play an influential role in the regulation of the brain-placental axis in mice.
Collapse
Affiliation(s)
- Maliha Islam
- Division of Animal Sciences, University of Missouri, 920 East Campus Drive, Columbia, Missouri, 65211, USA
| | - Susanta K Behura
- Division of Animal Sciences, University of Missouri, 920 East Campus Drive, Columbia, Missouri, 65211, USA; MU Institute for Data Science and Informatics, University of Missouri, USA; Interdisciplinary Reproduction and Health Group, University of Missouri, USA; Interdisciplinary Neuroscience Program, University of Missouri, USA.
| |
Collapse
|
2
|
Kasianov AS, Klepikova AV, Mayorov AV, Buzanov GS, Logacheva MD, Penin AA. Interspecific comparison of gene expression profiles using machine learning. PLoS Comput Biol 2023; 19:e1010743. [PMID: 36626392 PMCID: PMC9879537 DOI: 10.1371/journal.pcbi.1010743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 01/26/2023] [Accepted: 11/16/2022] [Indexed: 01/11/2023] Open
Abstract
Interspecific gene comparisons are the keystones for many areas of biological research and are especially important for the translation of knowledge from model organisms to economically important species. Currently they are hampered by the low resolution of methods based on sequence analysis and by the complex evolutionary history of eukaryotic genes. This is especially critical for plants, whose genomes are shaped by multiple whole genome duplications and subsequent gene loss. This requires the development of new methods for comparing the functions of genes in different species. Here, we report ISEEML (Interspecific Similarity of Expression Evaluated using Machine Learning)-a novel machine learning-based algorithm for interspecific gene classification. In contrast to previous studies focused on sequence similarity, our algorithm focuses on functional similarity inferred from the comparison of gene expression profiles. We propose novel metrics for expression pattern similarity-expression score (ES)-that is suitable for species with differing morphologies. As a proof of concept, we compare detailed transcriptome maps of Arabidopsis thaliana, the model species, Zea mays (maize) and Fagopyrum esculentum (common buckwheat), which are species that represent distant clades within flowering plants. The classifier resulted in an AUC of 0.91; under the ES threshold of 0.5, the specificity was 94%, and sensitivity was 72%.
Collapse
Affiliation(s)
- Artem S. Kasianov
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia
| | - Anna V. Klepikova
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia
| | - Alexey V. Mayorov
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia
| | | | - Maria D. Logacheva
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Aleksey A. Penin
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia
- * E-mail:
| |
Collapse
|
3
|
Bastide P, Soneson C, Stern DB, Lespinet O, Gallopin M. A Phylogenetic Framework to Simulate Synthetic Interspecies RNA-Seq Data. Mol Biol Evol 2023; 40:msac269. [PMID: 36508357 PMCID: PMC11249980 DOI: 10.1093/molbev/msac269] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 11/14/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
Interspecies RNA-Seq datasets are increasingly common, and have the potential to answer new questions about the evolution of gene expression. Single-species differential expression analysis is now a well-studied problem that benefits from sound statistical methods. Extensive reviews on biological or synthetic datasets have provided the community with a clear picture on the relative performances of the available methods in various settings. However, synthetic dataset simulation tools are still missing in the interspecies gene expression context. In this work, we develop and implement a new simulation framework. This tool builds on both the RNA-Seq and the phylogenetic comparative methods literatures to generate realistic count datasets, while taking into account the phylogenetic relationships between the samples. We illustrate the usefulness of this new framework through a targeted simulation study, that reproduces the features of a recently published dataset, containing gene expression data in adult eye tissue across blind and sighted freshwater crayfish species. Using our simulated datasets, we perform a fair comparison of several approaches used for differential expression analysis. This benchmark reveals some of the strengths and weaknesses of both the classical and phylogenetic approaches for interspecies differential expression analysis, and allows for a reanalysis of the crayfish dataset. The tool has been integrated in the R package compcodeR, freely available on Bioconductor.
Collapse
Affiliation(s)
- Paul Bastide
- IMAG, Université de Montpellier, CNRS, Montpellier, France
| | - Charlotte Soneson
- Friedrich Miescher Institute for Biomedical Research, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - David B Stern
- Department of Integrative Biology, University of Wisconsin-Madison, 430 Lincoln Drive, Madison, WI 53706, USA
| | - Olivier Lespinet
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| | - Mélina Gallopin
- Institute for Integrative Biology of the Cell (I2BC), Université Paris-Saclay, CEA, CNRS, 91198 Gif-sur-Yvette, France
| |
Collapse
|
4
|
Suetsugu K, Fukushima K, Makino T, Ikematsu S, Sakamoto T, Kimura S. Transcriptomic heterochrony and completely cleistogamous flower development in the mycoheterotrophic orchid Gastrodia. THE NEW PHYTOLOGIST 2023; 237:323-338. [PMID: 36110047 DOI: 10.1111/nph.18495] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/09/2022] [Indexed: 06/15/2023]
Abstract
Cleistogamy, in which plants can reproduce via self-fertilization within permanently closed flowers, has evolved in > 30 angiosperm lineages; however, consistent with Darwin's doubts about its existence, complete cleistogamy - the production of only cleistogamous flowers - has rarely been recognized. Thus far, the achlorophyllous orchid genus, Gastrodia, is the only known genus with several plausible completely cleistogamous species. Here, we analyzed the floral developmental transcriptomes of two recently evolved, completely cleistogamous Gastrodia species and their chasmogamous sister species to elucidate the possible changes involved in producing common cleistogamous traits. The ABBA-BABA test did not support introgression and protein sequence convergence as evolutionary mechanisms leading to cleistogamy, leaving convergence in gene expression as a plausible mechanism. Regarding transcriptomic differentiation, the two cleistogamous species had common modifications in the expression of developmental regulators, exhibiting a gene family-wide signature of convergent expression changes in MADS-box genes. Our transcriptomic pseudotime analysis revealed a prolonged juvenile state and eventual maturation, a heterochronic pattern consistent with partial neoteny, in cleistogamous flower development. These findings indicate that transcriptomic partial neoteny, arising from changes in the expression of conserved developmental regulators, might have contributed to the rapid and repeated evolution of cleistogamous flowers in Gastrodia.
Collapse
Affiliation(s)
- Kenji Suetsugu
- Department of Biology, Graduate School of Science, Kobe University, 1-1 Rokkodai, Nada-ku, Kobe, 657-8501, Japan
| | - Kenji Fukushima
- Institute for Molecular Plant Physiology and Biophysics, University of Würzburg, Julius-von-Sachs Platz 2, 97082, Würzburg, Germany
| | - Takashi Makino
- Graduate School of Life Sciences, Tohoku University, 6-3, Aramaki Aza Aoba, Aoba-ku, Sendai, 980-8578, Japan
| | - Shuka Ikematsu
- Faculty of Life Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
- Center for Plant Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
| | - Tomoaki Sakamoto
- Faculty of Life Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
- Center for Plant Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
| | - Seisuke Kimura
- Faculty of Life Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
- Center for Plant Sciences, Kyoto Sangyo University, Kamigamo-motoyama, Kita-ku, Kyoto, 603-8555, Japan
| |
Collapse
|
5
|
Reilly K, Ellis LJA, Davoudi HH, Supian S, Maia MT, Silva GH, Guo Z, Martinez DST, Lynch I. Daphnia as a model organism to probe biological responses to nanomaterials-from individual to population effects via adverse outcome pathways. FRONTIERS IN TOXICOLOGY 2023; 5:1178482. [PMID: 37124970 PMCID: PMC10140508 DOI: 10.3389/ftox.2023.1178482] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 04/06/2023] [Indexed: 05/02/2023] Open
Abstract
The importance of the cladoceran Daphnia as a model organism for ecotoxicity testing has been well-established since the 1980s. Daphnia have been increasingly used in standardised testing of chemicals as they are well characterised and show sensitivity to pollutants, making them an essential indicator species for environmental stress. The mapping of the genomes of D. pulex in 2012 and D. magna in 2017 further consolidated their utility for ecotoxicity testing, including demonstrating the responsiveness of the Daphnia genome to environmental stressors. The short lifecycle and parthenogenetic reproduction make Daphnia useful for assessment of developmental toxicity and adaption to stress. The emergence of nanomaterials (NMs) and their safety assessment has introduced some challenges to the use of standard toxicity tests which were developed for soluble chemicals. NMs have enormous reactive surface areas resulting in dynamic interactions with dissolved organic carbon, proteins and other biomolecules in their surroundings leading to a myriad of physical, chemical, biological, and macromolecular transformations of the NMs and thus changes in their bioavailability to, and impacts on, daphnids. However, NM safety assessments are also driving innovations in our approaches to toxicity testing, for both chemicals and other emerging contaminants such as microplastics (MPs). These advances include establishing more realistic environmental exposures via medium composition tuning including pre-conditioning by the organisms to provide relevant biomolecules as background, development of microfluidics approaches to mimic environmental flow conditions typical in streams, utilisation of field daphnids cultured in the lab to assess adaption and impacts of pre-exposure to pollution gradients, and of course development of mechanistic insights to connect the first encounter with NMs or MPs to an adverse outcome, via the key events in an adverse outcome pathway. Insights into these developments are presented below to inspire further advances and utilisation of these important organisms as part of an overall environmental risk assessment of NMs and MPs impacts, including in mixture exposure scenarios.
Collapse
Affiliation(s)
- Katie Reilly
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Laura-Jayne A. Ellis
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Hossein Hayat Davoudi
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Suffeiya Supian
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Marcella T. Maia
- Brazilian Nanotechnology National Laboratory (LNNano), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, Brazil
| | - Gabriela H. Silva
- Brazilian Nanotechnology National Laboratory (LNNano), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, Brazil
| | - Zhiling Guo
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, United Kingdom
- *Correspondence: Zhiling Guo, ; Iseult Lynch,
| | - Diego Stéfani T. Martinez
- Brazilian Nanotechnology National Laboratory (LNNano), Brazilian Center for Research in Energy and Materials (CNPEM), Campinas, Brazil
| | - Iseult Lynch
- School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, United Kingdom
- *Correspondence: Zhiling Guo, ; Iseult Lynch,
| |
Collapse
|
6
|
Begum T, Robinson-Rechavi M. Special Care Is Needed in Applying Phylogenetic Comparative Methods to Gene Trees with Speciation and Duplication Nodes. Mol Biol Evol 2021; 38:1614-1626. [PMID: 33169790 PMCID: PMC8042747 DOI: 10.1093/molbev/msaa288] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
How gene function evolves is a central question of evolutionary biology. It can be investigated by comparing functional genomics results between species and between genes. Most comparative studies of functional genomics have used pairwise comparisons. Yet it has been shown that this can provide biased results, as genes, like species, are phylogenetically related. Phylogenetic comparative methods should be used to correct for this, but they depend on strong assumptions, including unbiased tree estimates relative to the hypothesis being tested. Such methods have recently been used to test the “ortholog conjecture,” the hypothesis that functional evolution is faster in paralogs than in orthologs. Although pairwise comparisons of tissue specificity (τ) provided support for the ortholog conjecture, phylogenetic independent contrasts did not. Our reanalysis on the same gene trees identified problems with the time calibration of duplication nodes. We find that the gene trees used suffer from important biases, due to the inclusion of trees with no duplication nodes, to the relative age of speciations and duplications, to systematic differences in branch lengths, and to non-Brownian motion of tissue specificity on many trees. We find that incorrect implementation of phylogenetic method in empirical gene trees with duplications can be problematic. Controlling for biases allows successful use of phylogenetic methods to study the evolution of gene function and provides some support for the ortholog conjecture using three different phylogenetic approaches.
Collapse
Affiliation(s)
- Tina Begum
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
7
|
Ellis LJA, Kissane S, Hoffman E, Valsami-Jones E, Brown JB, Colbourne JK, Lynch I. Multigenerational Exposure to Nano‐TiO
2
Induces Ageing as a Stress Response Mitigated by Environmental Interactions. ADVANCED NANOBIOMED RESEARCH 2021. [DOI: 10.1002/anbr.202000083] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Affiliation(s)
- Laura-Jayne A. Ellis
- School of Geography, Earth and Environmental Sciences University of Birmingham Birmingham B15 2TT UK
| | - Stephen Kissane
- Environmental Transcriptomics Facility School of Biosciences University of Birmingham Birmingham B15 2TT UK
| | - Elijah Hoffman
- Genome Dynamics Department Life Sciences Division Lawrence Berkeley National Laboratory 1 Cyclotron Road Berkeley CA 94720 USA
| | - Eugenia Valsami-Jones
- School of Geography, Earth and Environmental Sciences University of Birmingham Birmingham B15 2TT UK
| | - James B. Brown
- Environmental Transcriptomics Facility School of Biosciences University of Birmingham Birmingham B15 2TT UK
- Genome Dynamics Department Life Sciences Division Lawrence Berkeley National Laboratory 1 Cyclotron Road Berkeley CA 94720 USA
| | - John K. Colbourne
- Environmental Transcriptomics Facility School of Biosciences University of Birmingham Birmingham B15 2TT UK
| | - Iseult Lynch
- School of Geography, Earth and Environmental Sciences University of Birmingham Birmingham B15 2TT UK
| |
Collapse
|
8
|
Linard B, Ebersberger I, McGlynn SE, Glover N, Mochizuki T, Patricio M, Lecompte O, Nevers Y, Thomas PD, Gabaldón T, Sonnhammer E, Dessimoz C, Uchiyama I. Ten Years of Collaborative Progress in the Quest for Orthologs. Mol Biol Evol 2021; 38:3033-3045. [PMID: 33822172 PMCID: PMC8321534 DOI: 10.1093/molbev/msab098] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 02/07/2021] [Accepted: 04/01/2021] [Indexed: 12/19/2022] Open
Abstract
Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit-from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.
Collapse
Affiliation(s)
- Benjamin Linard
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,SPYGEN, Le Bourget-du-Lac, France
| | - Ingo Ebersberger
- Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, Germany.,LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt, Germany
| | - Shawn E McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan.,Blue Marble Space Institute of Science, Seattle, WA, USA
| | - Natasha Glover
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Tomohiro Mochizuki
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Yannis Nevers
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BCS-CNS), Jordi Girona, Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Department of Computer Science, University College London, London, United Kingdom.,Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ikuo Uchiyama
- Department of Theoretical Biology, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
| | | |
Collapse
|
9
|
Dong N, Bandura J, Zhang Z, Wang Y, Labadie K, Noel B, Davison A, Koene JM, Sun HS, Coutellec MA, Feng ZP. Ion channel profiling of the Lymnaea stagnalis ganglia via transcriptome analysis. BMC Genomics 2021; 22:18. [PMID: 33407100 PMCID: PMC7789530 DOI: 10.1186/s12864-020-07287-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 11/28/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The pond snail Lymnaea stagnalis (L. stagnalis) has been widely used as a model organism in neurobiology, ecotoxicology, and parasitology due to the relative simplicity of its central nervous system (CNS). However, its usefulness is restricted by a limited availability of transcriptome data. While sequence information for the L. stagnalis CNS transcripts has been obtained from EST libraries and a de novo RNA-seq assembly, the quality of these assemblies is limited by a combination of low coverage of EST libraries, the fragmented nature of de novo assemblies, and lack of reference genome. RESULTS In this study, taking advantage of the recent availability of a preliminary L. stagnalis genome, we generated an RNA-seq library from the adult L. stagnalis CNS, using a combination of genome-guided and de novo assembly programs to identify 17,832 protein-coding L. stagnalis transcripts. We combined our library with existing resources to produce a transcript set with greater sequence length, completeness, and diversity than previously available ones. Using our assembly and functional domain analysis, we profiled L. stagnalis CNS transcripts encoding ion channels and ionotropic receptors, which are key proteins for CNS function, and compared their sequences to other vertebrate and invertebrate model organisms. Interestingly, L. stagnalis transcripts encoding numerous putative Ca2+ channels showed the most sequence similarity to those of Mus musculus, Danio rerio, Xenopus tropicalis, Drosophila melanogaster, and Caenorhabditis elegans, suggesting that many calcium channel-related signaling pathways may be evolutionarily conserved. CONCLUSIONS Our study provides the most thorough characterization to date of the L. stagnalis transcriptome and provides insights into differences between vertebrates and invertebrates in CNS transcript diversity, according to function and protein class. Furthermore, this study provides a complete characterization of the ion channels of Lymnaea stagnalis, opening new avenues for future research on fundamental neurobiological processes in this model system.
Collapse
Affiliation(s)
- Nancy Dong
- Department of Physiology, University of Toronto, 3308 MSB, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
| | - Julia Bandura
- Department of Physiology, University of Toronto, 3308 MSB, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
| | - Zhaolei Zhang
- Donnelly Centre for Cellular and Biomolecular Research and Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Yan Wang
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, M5S 3B2, Canada
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario, M1C 1A4, Canada
| | - Karine Labadie
- Genoscope, Institut de biologie François Jacob, Commissariat à l'Energie Atomique (CEA), Université Paris-Saclay, BP5706, 91057, Evry, France
| | - Benjamin Noel
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, University of Evry, Université Paris-Saclay, 91057, Evry, France
| | - Angus Davison
- School of Life Sciences, University of Nottingham, University Park, Nottingham, UK, NG7 2RD, UK
| | - Joris M Koene
- Department of Ecological Science, Faculty of Science, Vrije Universiteit, Amsterdam, The Netherlands
| | - Hong-Shuo Sun
- Department of Physiology, University of Toronto, 3308 MSB, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada
- Department of Surgery, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
| | | | - Zhong-Ping Feng
- Department of Physiology, University of Toronto, 3308 MSB, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada.
| |
Collapse
|
10
|
Poliakov E, Uppal S, Rogozin IB, Gentleman S, Redmond TM. Evolutionary aspects and enzymology of metazoan carotenoid cleavage oxygenases. Biochim Biophys Acta Mol Cell Biol Lipids 2020; 1865:158665. [PMID: 32061750 PMCID: PMC7423639 DOI: 10.1016/j.bbalip.2020.158665] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 01/25/2020] [Accepted: 02/05/2020] [Indexed: 12/18/2022]
Abstract
The carotenoids are terpenoid fat-soluble pigments produced by plants, algae, and several bacteria and fungi. They are ubiquitous components of animal diets. Carotenoid cleavage oxygenase (CCO) superfamily members are involved in carotenoid metabolism and are present in all kingdoms of life. Throughout the animal kingdom, carotenoid oxygenases are widely distributed and they are completely absent only in two unicellular organisms, Monosiga and Leishmania. Mammals have three paralogs 15,15'-β-carotene oxygenase (BCO1), 9',10'-β-carotene oxygenase (BCO2) and RPE65. The first two enzymes are classical carotenoid oxygenases: they cleave carbon‑carbon double bonds and incorporate two atoms of oxygen in the substrate at the site of cleavage. The third, RPE65, is an unusual family member, it is the retinoid isomerohydrolase in the visual cycle that converts all-trans-retinyl ester into 11-cis-retinol. Here we discuss evolutionary aspects of the carotenoid cleavage oxygenase superfamily and their enzymology to deduce what insight we can obtain from their evolutionary conservation.
Collapse
Affiliation(s)
- Eugenia Poliakov
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | - Sheetal Uppal
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Susan Gentleman
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - T Michael Redmond
- Laboratory of Retinal Cell & Molecular Biology, National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
11
|
Ahrens JB, Teufel AI, Siltberg-Liberles J. A Phylogenetic Rate Parameter Indicates Different Sequence Divergence Patterns in Orthologs and Paralogs. J Mol Evol 2020; 88:720-730. [PMID: 33118098 DOI: 10.1007/s00239-020-09969-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 10/15/2020] [Indexed: 10/23/2022]
Abstract
Heterotachy-the change in sequence evolutionary rate over time-is a common feature of protein molecular evolution. Decades of studies have shed light on the conditions under which heterotachy occurs, and there is evidence that site-specific evolutionary rate shifts are correlated with changes in protein function. Here, we present a large-scale, computational analysis using thousands of protein sequence alignments from animal and plant proteomes, representing genes related either by orthology (speciation events) or paralogy (gene duplication), to compare sequence divergence patterns in orthologous vs. paralogous sequence alignments. We use sequence-based phylogenetic analyses to infer overall sequence divergence (tree length/number of sequences) and to fit site-specific rates to a discrete gamma distribution with a shape parameter α. This inference method is applied to real protein sequence alignments, as well as alignments simulated under various models of protein sequence evolution. Our simulations indicate that sequence divergence and the α parameter are positively correlated when sequences evolve with heterotachy, meaning that inferred site rate distributions appear more uniform as sequences diverge. Divergence and α are also positively correlated in both orthologous and paralogous genes, but the average increase in α (as a function of divergence) is significantly higher in paralogous protein alignments than in orthologous alignments. This result is consistent with the widely held view that recently duplicated proteins initially evolve under relaxed selective pressure, promoting functional divergence by accumulation of amino acid replacements, and hence experience more evolutionary rate fluctuations than orthologous proteins. We discuss these findings in the context of the ortholog conjecture, a long-standing assumption in molecular evolution, which posits that protein sequences related by orthology tend to be more functionally conserved than paralogous proteins.
Collapse
Affiliation(s)
- Joseph B Ahrens
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, Miami, FL, USA. .,Department of Biochemistry and Molecular Genetics, Computational Bioscience Program, University of Colorado Denver, Aurora, CO, USA.
| | - Ashley I Teufel
- Department of Integrative Biology, The University of Texas At Austin, Austin, TX, USA.,Santa Fe Institute, Santa Fe, NM, USA
| | - Jessica Siltberg-Liberles
- Department of Biological Sciences, Biomolecular Sciences Institute, Florida International University, Miami, FL, USA.
| |
Collapse
|
12
|
Amalgamated cross-species transcriptomes reveal organ-specific propensity in gene expression evolution. Nat Commun 2020; 11:4459. [PMID: 32900997 PMCID: PMC7479108 DOI: 10.1038/s41467-020-18090-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 07/29/2020] [Indexed: 12/24/2022] Open
Abstract
The origins of multicellular physiology are tied to evolution of gene expression. Genes can shift expression as organisms evolve, but how ancestral expression influences altered descendant expression is not well understood. To examine this, we amalgamate 1,903 RNA-seq datasets from 182 research projects, including 6 organs in 21 vertebrate species. Quality control eliminates project-specific biases, and expression shifts are reconstructed using gene-family-wise phylogenetic Ornstein-Uhlenbeck models. Expression shifts following gene duplication result in more drastic changes in expression properties than shifts without gene duplication. The expression properties are tightly coupled with protein evolutionary rate, depending on whether and how gene duplication occurred. Fluxes in expression patterns among organs are nonrandom, forming modular connections that are reshaped by gene duplication. Thus, if expression shifts, ancestral expression in some organs induces a strong propensity for expression in particular organs in descendants. Regardless of whether the shifts are adaptive or not, this supports a major role for what might be termed preadaptive pathways of gene expression evolution.
Collapse
|
13
|
Stamboulian M, Guerrero RF, Hahn MW, Radivojac P. The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. Bioinformatics 2020; 36:i219-i226. [PMID: 32657391 PMCID: PMC7355290 DOI: 10.1093/bioinformatics/btaa468] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION The computational prediction of gene function is a key step in making full use of newly sequenced genomes. Function is generally predicted by transferring annotations from homologous genes or proteins for which experimental evidence exists. The 'ortholog conjecture' proposes that orthologous genes should be preferred when making such predictions, as they evolve functions more slowly than paralogous genes. Previous research has provided little support for the ortholog conjecture, though the incomplete nature of the data cast doubt on the conclusions. RESULTS We use experimental annotations from over 40 000 proteins, drawn from over 80 000 publications, to revisit the ortholog conjecture in two pairs of species: (i) Homo sapiens and Mus musculus and (ii) Saccharomyces cerevisiae and Schizosaccharomyces pombe. By making a distinction between questions about the evolution of function versus questions about the prediction of function, we find strong evidence against the ortholog conjecture in the context of function prediction, though questions about the evolution of function remain difficult to address. In both pairs of species, we quantify the amount of information that would be ignored if paralogs are discarded, as well as the resulting loss in prediction accuracy. Taken as a whole, our results support the view that the types of homologs used for function transfer are largely irrelevant to the task of function prediction. Maximizing the amount of data used for this task, regardless of whether it comes from orthologs or paralogs, is most likely to lead to higher prediction accuracy. AVAILABILITY AND IMPLEMENTATION https://github.com/predragradivojac/oc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Moses Stamboulian
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Rafael F Guerrero
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Matthew W Hahn
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
14
|
David KT, Oaks JR, Halanych KM. Patterns of gene evolution following duplications and speciations in vertebrates. PeerJ 2020; 8:e8813. [PMID: 32266119 PMCID: PMC7120047 DOI: 10.7717/peerj.8813] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 02/27/2020] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Eukaryotic genes typically form independent evolutionary lineages through either speciation or gene duplication events. Generally, gene copies resulting from speciation events (orthologs) are expected to maintain similarity over time with regard to sequence, structure and function. After a duplication event, however, resulting gene copies (paralogs) may experience a broader set of possible fates, including partial (subfunctionalization) or complete loss of function, as well as gain of new function (neofunctionalization). This assumption, known as the Ortholog Conjecture, is prevalent throughout molecular biology and notably plays an important role in many functional annotation methods. Unfortunately, studies that explicitly compare evolutionary processes between speciation and duplication events are rare and conflicting. METHODS To provide an empirical assessment of ortholog/paralog evolution, we estimated ratios of nonsynonymous to synonymous substitutions (ω = dN/dS) for 251,044 lineages in 6,244 gene trees across 77 vertebrate taxa. RESULTS Overall, we found ω to be more similar between lineages descended from speciation events (p < 0.001) than lineages descended from duplication events, providing strong support for the Ortholog Conjecture. The asymmetry in ω following duplication events appears to be largely driven by an increase along one of the paralogous lineages, while the other remains similar to the parent. This trend is commonly associated with neofunctionalization, suggesting that gene duplication is a significant mechanism for generating novel gene functions.
Collapse
Affiliation(s)
- Kyle T. David
- Department of Biological Sciences, Auburn University, Auburn, AL, USA
| | - Jamie R. Oaks
- Department of Biological Sciences, Auburn University, Auburn, AL, USA
| | | |
Collapse
|
15
|
Fan J, Cannistra A, Fried I, Lim T, Schaffner T, Crovella M, Hescott B, Leiserson MDM. Functional protein representations from biological networks enable diverse cross-species inference. Nucleic Acids Res 2019; 47:e51. [PMID: 30847485 PMCID: PMC6511848 DOI: 10.1093/nar/gkz132] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 01/09/2019] [Accepted: 02/18/2019] [Indexed: 12/31/2022] Open
Abstract
Transferring knowledge between species is key for many biological applications, but is complicated by divergent and convergent evolution. Many current approaches for this problem leverage sequence and interaction network data to transfer knowledge across species, exemplified by network alignment methods. While these techniques do well, they are limited in scope, creating metrics to address one specific problem or task. We take a different approach by creating an environment where multiple knowledge transfer tasks can be performed using the same protein representations. Specifically, our kernel-based method, MUNK, integrates sequence and network structure to create functional protein representations, embedding proteins from different species in the same vector space. First we show proteins in different species that are close in MUNK-space are functionally similar. Next, we use these representations to share knowledge of synthetic lethal interactions between species. Importantly, we find that the results using MUNK-representations are at least as accurate as existing algorithms for these tasks. Finally, we generalize the notion of a phenolog ('orthologous phenotype') to use functionally similar proteins (i.e. those with similar representations). We demonstrate the utility of this broadened notion by using it to identify known phenologs and novel non-obvious ones supported by current research.
Collapse
Affiliation(s)
- Jason Fan
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| | | | - Inbar Fried
- University of North Carolina Medical School, USA
| | - Tim Lim
- Department of Computer Science, Boston University, USA
| | | | - Mark Crovella
- Department of Computer Science, Boston University, USA
| | - Benjamin Hescott
- College of Computer and Information Science, Northeastern University, USA
| | - Mark D M Leiserson
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA
| |
Collapse
|
16
|
Zmasek CM, Knipe DM, Pellett PE, Scheuermann RH. Classification of human Herpesviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO). Virology 2019; 529:29-42. [PMID: 30660046 PMCID: PMC6502252 DOI: 10.1016/j.virol.2019.01.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 01/04/2019] [Accepted: 01/04/2019] [Indexed: 12/13/2022]
Abstract
We developed a computational approach called Domain-architecture Aware Inference of Orthologs (DAIO) for the analysis of protein orthology by combining phylogenetic and protein domain-architecture information. Using DAIO, we performed a systematic study of the proteomes of all human Herpesviridae species to define Strict Ortholog Groups (SOGs). In addition to assessing the taxonomic distribution for each protein based on sequence similarity, we performed a protein domain-architecture analysis for every protein family and computationally inferred gene duplication events. While many herpesvirus proteins have evolved without any detectable gene duplications or domain rearrangements, numerous herpesvirus protein families do exhibit complex evolutionary histories. Some proteins acquired additional domains (e.g., DNA polymerase), whereas others show a combination of domain acquisition and gene duplication (e.g., betaherpesvirus US22 family), with possible functional implications. This novel classification system of SOGs for human Herpesviridae proteins is available through the Virus Pathogen Resource (ViPR, www.viprbrc.org).
Collapse
Affiliation(s)
| | - David M Knipe
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Philip E Pellett
- Department of Biochemistry, Microbiology & Immunology, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | - Richard H Scheuermann
- J. Craig Venter Institute, La Jolla, CA 92037, USA; Department of Pathology, University of California, San Diego, CA 92093, USA; Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA 92037, USA.
| |
Collapse
|
17
|
Ambrosino L, Ruggieri V, Bostan H, Miralto M, Vitulo N, Zouine M, Barone A, Bouzayen M, Frusciante L, Pezzotti M, Valle G, Chiusano ML. Multilevel comparative bioinformatics to investigate evolutionary relationships and specificities in gene annotations: an example for tomato and grapevine. BMC Bioinformatics 2018; 19:435. [PMID: 30497367 PMCID: PMC6266932 DOI: 10.1186/s12859-018-2420-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Background “Omics” approaches may provide useful information for a deeper understanding of speciation events, diversification and function innovation. This can be achieved by investigating the molecular similarities at sequence level between species, allowing the definition of ortholog and paralog genes. However, the spreading of sequenced genome, often endowed with still preliminary annotations, requires suitable bioinformatics to be appropriately exploited in this framework. Results We presented here a multilevel comparative approach to investigate on genome evolutionary relationships and peculiarities of two fleshy fruit species of relevant agronomic interest, Solanum lycopersicum (tomato) and Vitis vinifera (grapevine). We defined 17,823 orthology relationships between tomato and grapevine reference gene annotations. The resulting orthologs are associated with the detected paralogs in each species, permitting the definition of gene networks, useful to investigate the different relationships. The reconciliation of the compared collections in terms of an updating of the functional descriptions was also exploited. All the results were made accessible in ComParaLogs, a dedicated bioinformatics platform available at http://biosrv.cab.unina.it/comparalogs/gene/search. Conclusions The aim of the work was to suggest a reliable approach to detect all similarities of gene loci between two species based on the integration of results from different levels of information, such as the gene, the transcript and the protein sequences, overcoming possible limits due to exclusive protein versus protein comparisons. This to define reliable ortholog and paralog genes, as well as species specific gene loci in the two species, overcoming limits due to the possible draft nature of preliminary gene annotations. Moreover, reconciled functional descriptions, as well as common or peculiar enzymatic classes and protein domains from tomato and grapevine, together with the definition of species-specific gene sets after the pairwise comparisons, contributed a comprehensive set of information useful to comparatively exploit the two species gene annotations and investigate on differences between species with climacteric and non-climacteric fruits. In addition, the definition of networks of ortholog genes and of associated paralogs, and the organization of web-based interfaces for the exploration of the results, defined a friendly computational bench-work in support of comparative analyses between two species. Electronic supplementary material The online version of this article (10.1186/s12859-018-2420-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Luca Ambrosino
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Valentino Ruggieri
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Center for Research in Agricultural Genomics, Cerdanyola, Barcelona, Spain
| | - Hamed Bostan
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Plants for Human Health Institute, North Carolina State University, Kannapolis, NC, USA
| | - Marco Miralto
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy.,Current address: Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Nicola Vitulo
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Mohamed Zouine
- Génomique et Biotechnologie des Fruits, UMR990 INRA / INP-Toulouse, Université de Toulouse, Castanet-Tolosan, France
| | - Amalia Barone
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy
| | - Mondher Bouzayen
- Génomique et Biotechnologie des Fruits, UMR990 INRA / INP-Toulouse, Université de Toulouse, Castanet-Tolosan, France
| | - Luigi Frusciante
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy
| | - Mario Pezzotti
- Department of Biotechnology, University of Verona, Verona, Italy
| | - Giorgio Valle
- CRIBI Biotechnology Centre, University of Padova, Padova, Italy
| | - Maria Luisa Chiusano
- Department of Agriculture, University of Naples "Federico II,", Portici, Naples, Italy. .,Research Infrastructures for Marine Biological Resources, Stazione Zoologica Anton Dohrn, Naples, Italy.
| |
Collapse
|
18
|
Mier P, Pérez-Pulido AJ, Andrade-Navarro MA. Automated selection of homologs to track the evolutionary history of proteins. BMC Bioinformatics 2018; 19:431. [PMID: 30453878 PMCID: PMC6245638 DOI: 10.1186/s12859-018-2457-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 10/31/2018] [Indexed: 11/26/2022] Open
Abstract
Background The selection of distant homologs of a query protein under study is a usual and useful application of protein sequence databases. Such sets of homologs are often applied to investigate the function of a protein and the degree to which experimental results can be transferred from one organism to another. In particular, a variety of databases facilitates static browsing for orthologs. However, these resources have a limited power when identifying orthologs between taxonomically distant species. In addition, in some situations, for a given query protein, it is advantageous to compare the sets of orthologs from different specific organisms: this recursive step-wise search might give an idea of the evolutionary path of the protein as a series of consecutive steps, for example gaining or losing domains. However, a step-wise orthology search is a time-consuming task if the number of steps is high. Results To illustrate a solution for this problem, we present the web tool ProteinPathTracker, which allows to track the evolutionary history of a query protein by locating homologs in selected proteomes along several evolutionary paths. Additional functionalities include locking a region of interest to follow its evolution in the discovered homologous sequences and the study of the protein function evolution by analysis of the annotations of the homologs. Conclusions ProteinPathTracker is an easy-to-use web tool that automatises the practice of looking for selected homologs in distant species in a straightforward way for non-expert users. Electronic supplementary material The online version of this article (10.1186/s12859-018-2457-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
| | | | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| |
Collapse
|
19
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
20
|
Streubel S, Fritz MA, Teltow M, Kappel C, Sicard A. Successive duplication-divergence mechanisms at the RCO locus contributed to leaf shape diversity in the Brassicaceae. Development 2018; 145:145/8/dev164301. [PMID: 29691226 DOI: 10.1242/dev.164301] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2018] [Accepted: 03/21/2018] [Indexed: 12/19/2022]
Abstract
Gene duplication is a major driver for the increase of biological complexity. The divergence of newly duplicated paralogs may allow novel functions to evolve, while maintaining the ancestral one. Alternatively, partitioning the ancestral function among paralogs may allow parts of that role to follow independent evolutionary trajectories. We studied the REDUCED COMPLEXITY (RCO) locus, which contains three paralogs that have evolved through two independent events of gene duplication, and which underlies repeated events of leaf shape evolution within the Brassicaceae. In particular, we took advantage of the presence of three potentially functional paralogs in Capsella to investigate the extent of functional divergence among them. We demonstrate that the RCO copies control growth in different areas of the leaf. Consequently, the copies that are retained active in the different Brassicaceae lineages contribute to define the leaf dissection pattern. Our results further illustrate how successive gene duplication events and subsequent functional divergence can increase trait evolvability by providing independent evolutionary trajectories to specialized functions that have an additive effect on a given trait.
Collapse
Affiliation(s)
- Susanna Streubel
- Institut für Biochemie und Biologie, Universität Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Germany
| | - Michael André Fritz
- Institut für Biochemie und Biologie, Universität Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Germany
| | - Melanie Teltow
- Institut für Biochemie und Biologie, Universität Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Germany
| | - Christian Kappel
- Institut für Biochemie und Biologie, Universität Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Germany
| | - Adrien Sicard
- Institut für Biochemie und Biologie, Universität Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam-Golm, Germany .,Uppsala Biocenter, Department of Plant Biology, BOX 7080, 750 07, Uppsala, Sweden
| |
Collapse
|
21
|
Austin ED, Hamid R. Y Not? Sex Chromosomes May Modify Sexual Dimorphism in Pulmonary Hypertension. Am J Respir Crit Care Med 2018; 197:858-859. [PMID: 28968140 DOI: 10.1164/rccm.201709-1865ed] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Affiliation(s)
- Eric D Austin
- 1 Department of Pediatrics Vanderbilt University Medical Center Nashville, Tennessee
| | - Rizwan Hamid
- 1 Department of Pediatrics Vanderbilt University Medical Center Nashville, Tennessee
| |
Collapse
|
22
|
Abstract
The prevalence of purifying selection in the nature suggests that larger organisms bear a higher number of slightly deleterious mutations because of smaller populations and therefore weaker selection. In this work redistribution of purifying selection in favor of information genes, pathways and processes was found in primates compared with treeshrew and rodents on the ground of genome-wide analysis. The genes which are more favored in primates belong mainly to regulation of gene expression and development, in treeshrew and rodents, to metabolism, transport, energetics, reproduction and olfaction. The former occur predominantly in the nucleus, the latter, in the cytoplasm and membranes. Thus, although purifying selection is on average weaker in the primates, it is stronger concentrated on the "information technology" of life (regulation of gene expression and development). Increased accuracy of information processes probably allows escaping "error catastrophes" in spite of more complex organization, larger body size and higher longevity.
Collapse
|
23
|
ARSDA: A New Approach for Storing, Transmitting and Analyzing Transcriptomic Data. G3-GENES GENOMES GENETICS 2017; 7:3839-3848. [PMID: 29079682 PMCID: PMC5714481 DOI: 10.1534/g3.117.300271] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Two major stumbling blocks exist in high-throughput sequencing (HTS) data analysis. The first is the sheer file size, typically in gigabytes when uncompressed, causing problems in storage, transmission, and analysis. However, these files do not need to be so large, and can be reduced without loss of information. Each HTS file, either in compressed .SRA or plain text .fastq format, contains numerous identical reads stored as separate entries. For example, among 44,603,541 forward reads in the SRR4011234.sra file (from a Bacillus subtilis transcriptomic study) deposited at NCBI’s SRA database, one read has 497,027 identical copies. Instead of storing them as separate entries, one can and should store them as a single entry with the SeqID_NumCopy format (which I dub as FASTA+ format). The second is the proper allocation of reads that map equally well to paralogous genes. I illustrate in detail a new method for such allocation. I have developed ARSDA software that implement these new approaches. A number of HTS files for model species are in the process of being processed and deposited at http://coevol.rdc.uottawa.ca to demonstrate that this approach not only saves a huge amount of storage space and transmission bandwidth, but also dramatically reduces time in downstream data analysis. Instead of matching the 497,027 identical reads separately against the B. subtilis genome, one only needs to match it once. ARSDA includes functions to take advantage of HTS data in the new sequence format for downstream data analysis such as gene expression characterization. I contrasted gene expression results between ARSDA and Cufflinks so readers can better appreciate the strength of ARSDA. ARSDA is freely available for Windows, Linux. and Macintosh computers at http://dambe.bio.uottawa.ca/ARSDA/ARSDA.aspx.
Collapse
|
24
|
Mans BJ, Featherston J, de Castro MH, Pienaar R. Gene Duplication and Protein Evolution in Tick-Host Interactions. Front Cell Infect Microbiol 2017; 7:413. [PMID: 28993800 PMCID: PMC5622192 DOI: 10.3389/fcimb.2017.00413] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Accepted: 09/06/2017] [Indexed: 01/01/2023] Open
Abstract
Ticks modulate their hosts' defense responses by secreting a biopharmacopiea of hundreds to thousands of proteins and bioactive chemicals into the feeding site (tick-host interface). These molecules and their functions evolved over millions of years as ticks adapted to blood-feeding, tick lineages diverged, and host-shifts occurred. The evolution of new proteins with new functions is mainly dependent on gene duplication events. Central questions around this are the rates of gene duplication, when they occurred and how new functions evolve after gene duplication. The current review investigates these questions in the light of tick biology and considers the possibilities of ancient genome duplication, lineage specific expansion events, and the role that positive selection played in the evolution of tick protein function. It contrasts current views in tick biology regarding adaptive evolution with the more general view that neutral evolution may account for the majority of biological innovations observed in ticks.
Collapse
Affiliation(s)
- Ben J Mans
- Epidemiology, Parasites and Vectors, Agricultural Research Council-Onderstepoort Veterinary ResearchOnderstepoort, South Africa.,Department of Veterinary Tropical Diseases, University of PretoriaPretoria, South Africa.,Department of Life and Consumer Sciences, University of South AfricaPretoria, South Africa
| | - Jonathan Featherston
- Agricultural Research Council-The Biotechnology PlatformOnderstepoort, South Africa
| | - Minique H de Castro
- Epidemiology, Parasites and Vectors, Agricultural Research Council-Onderstepoort Veterinary ResearchOnderstepoort, South Africa.,Department of Life and Consumer Sciences, University of South AfricaPretoria, South Africa.,Agricultural Research Council-The Biotechnology PlatformOnderstepoort, South Africa
| | - Ronel Pienaar
- Epidemiology, Parasites and Vectors, Agricultural Research Council-Onderstepoort Veterinary ResearchOnderstepoort, South Africa
| |
Collapse
|
25
|
Guschanski K, Warnefors M, Kaessmann H. The evolution of duplicate gene expression in mammalian organs. Genome Res 2017; 27:1461-1474. [PMID: 28743766 PMCID: PMC5580707 DOI: 10.1101/gr.215566.116] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Accepted: 07/18/2017] [Indexed: 12/16/2022]
Abstract
Gene duplications generate genomic raw material that allows the emergence of novel functions, likely facilitating adaptive evolutionary innovations. However, global assessments of the functional and evolutionary relevance of duplicate genes in mammals were until recently limited by the lack of appropriate comparative data. Here, we report a large-scale study of the expression evolution of DNA-based functional gene duplicates in three major mammalian lineages (placental mammals, marsupials, egg-laying monotremes) and birds, on the basis of RNA sequencing (RNA-seq) data from nine species and eight organs. We observe dynamic changes in tissue expression preference of paralogs with different duplication ages, suggesting differential contribution of paralogs to specific organ functions during vertebrate evolution. Specifically, we show that paralogs that emerged in the common ancestor of bony vertebrates are enriched for genes with brain-specific expression and provide evidence for differential forces underlying the preferential emergence of young testis- and liver-specific expressed genes. Further analyses uncovered that the overall spatial expression profiles of gene families tend to be conserved, with several exceptions of pronounced tissue specificity shifts among lineage-specific gene family expansions. Finally, we trace new lineage-specific genes that may have contributed to the specific biology of mammalian organs, including the little-studied placenta. Overall, our study provides novel and taxonomically broad evidence for the differential contribution of duplicate genes to tissue-specific transcriptomes and for their importance for the phenotypic evolution of vertebrates.
Collapse
Affiliation(s)
- Katerina Guschanski
- Department of Animal Ecology, Evolutionary Biology Centre, Uppsala University, S-75105 Uppsala, Sweden
| | - Maria Warnefors
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, D-69120 Heidelberg, Germany
| | - Henrik Kaessmann
- Center for Molecular Biology of Heidelberg University (ZMBH), DKFZ-ZMBH Alliance, D-69120 Heidelberg, Germany
| |
Collapse
|
26
|
Abstract
Surveys of public sequence resources show that experimentally supported functional information is still completely missing for a considerable fraction of known proteins and is clearly incomplete for an even larger portion. Bioinformatics methods have long made use of very diverse data sources alone or in combination to predict protein function, with the understanding that different data types help elucidate complementary biological roles. This chapter focuses on methods accepting amino acid sequences as input and producing GO term assignments directly as outputs; the relevant biological and computational concepts are presented along with the advantages and limitations of individual approaches.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
27
|
Kryuchkova-Mostacci N, Robinson-Rechavi M. Tissue-Specificity of Gene Expression Diverges Slowly between Orthologs, and Rapidly between Paralogs. PLoS Comput Biol 2016; 12:e1005274. [PMID: 28030541 PMCID: PMC5193323 DOI: 10.1371/journal.pcbi.1005274] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 11/26/2016] [Indexed: 11/18/2022] Open
Abstract
The ortholog conjecture implies that functional similarity between orthologous genes is higher than between paralogs. It has been supported using levels of expression and Gene Ontology term analysis, although the evidence was rather weak and there were also conflicting reports. In this study on 12 species we provide strong evidence of high conservation in tissue-specificity between orthologs, in contrast to low conservation between within-species paralogs. This allows us to shed a new light on the evolution of gene expression patterns. While there have been several studies of the correlation of expression between species, little is known about the evolution of tissue-specificity itself. Ortholog tissue-specificity is strongly conserved between all tetrapod species, with the lowest Pearson correlation between mouse and frog at r = 0.66. Tissue-specificity correlation decreases strongly with divergence time. Paralogs in human show much lower conservation, even for recent Primate-specific paralogs. When both paralogs from ancient whole genome duplication tissue-specific paralogs are tissue-specific, it is often to different tissues, while other tissue-specific paralogs are mostly specific to the same tissue. The same patterns are observed using human or mouse as focal species, and are robust to choices of datasets and of thresholds. Our results support the following model of evolution: in the absence of duplication, tissue-specificity evolves slowly, and tissue-specific genes do not change their main tissue of expression; after small-scale duplication the less expressed paralog loses the ancestral specificity, leading to an immediate difference between paralogs; over time, both paralogs become more broadly expressed, but remain poorly correlated. Finally, there is a small number of paralog pairs which stay tissue-specific with the same main tissue of expression, for at least 300 million years.
Collapse
Affiliation(s)
- Nadezda Kryuchkova-Mostacci
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
28
|
Ibn-Salem J, Muro EM, Andrade-Navarro MA. Co-regulation of paralog genes in the three-dimensional chromatin architecture. Nucleic Acids Res 2016; 45:81-91. [PMID: 27634932 PMCID: PMC5224500 DOI: 10.1093/nar/gkw813] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 08/31/2016] [Accepted: 09/03/2016] [Indexed: 12/20/2022] Open
Abstract
Paralog genes arise from gene duplication events during evolution, which often lead to similar proteins that cooperate in common pathways and in protein complexes. Consequently, paralogs show correlation in gene expression whereby the mechanisms of co-regulation remain unclear. In eukaryotes, genes are regulated in part by distal enhancer elements through looping interactions with gene promoters. These looping interactions can be measured by genome-wide chromatin conformation capture (Hi-C) experiments, which revealed self-interacting regions called topologically associating domains (TADs). We hypothesize that paralogs share common regulatory mechanisms to enable coordinated expression according to TADs. To test this hypothesis, we integrated paralogy annotations with human gene expression data in diverse tissues, genome-wide enhancer-promoter associations and Hi-C experiments in human, mouse and dog genomes. We show that paralog gene pairs are enriched for co-localization in the same TAD, share more often common enhancer elements than expected and have increased contact frequencies over large genomic distances. Combined, our results indicate that paralogs share common regulatory mechanisms and cluster not only in the linear genome but also in the three-dimensional chromatin architecture. This enables concerted expression of paralogs over diverse cell-types and indicate evolutionary constraints in functional genome organization.
Collapse
Affiliation(s)
- Jonas Ibn-Salem
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany.,Institute of Molecular Biology, 55128 Mainz, Germany
| | - Enrique M Muro
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany.,Institute of Molecular Biology, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University of Mainz, 55128 Mainz, Germany .,Institute of Molecular Biology, 55128 Mainz, Germany
| |
Collapse
|
29
|
Cheung PPH, Rogozin IB, Choy KT, Ng HY, Peiris JSM, Yen HL. Comparative mutational analyses of influenza A viruses. RNA (NEW YORK, N.Y.) 2015; 21:36-47. [PMID: 25404565 PMCID: PMC4274636 DOI: 10.1261/rna.045369.114] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
The error-prone RNA-dependent RNA polymerase (RdRP) and external selective pressures are the driving forces for RNA viral diversity. When confounded by selective pressures, it is difficult to assess if influenza A viruses (IAV) that have a wide host range possess comparable or distinct spontaneous mutational frequency in their RdRPs. We used in-depth bioinformatics analyses to assess the spontaneous mutational frequencies of two RdRPs derived from human seasonal (A/Wuhan/359/95; Wuhan) and H5N1 (A/Vietnam/1203/04; VN1203) viruses using the mini-genome system with a common firefly luciferase reporter serving as the template. High-fidelity reverse transcriptase was applied to generate high-quality mutational spectra which allowed us to assess and compare the mutational frequencies and mutable motifs along a target sequence of the two RdRPs of two different subtypes. We observed correlated mutational spectra (τ correlation P < 0.0001), comparable mutational frequencies (H3N2:5.8 ± 0.9; H5N1:6.0 ± 0.5), and discovered a highly mutable motif "(A)AAG" for both Wuhan and VN1203 RdRPs. Results were then confirmed with two recombinant A/Puerto Rico/8/34 (PR8) viruses that possess RdRP derived from Wuhan or VN1203 (RG-PR8×Wuhan(PB2, PB1, PA, NP) and RG-PR8×VN1203(PB2, PB1, PA, NP)). Applying novel bioinformatics analysis on influenza mutational spectra, we provide a platform for a comprehensive analysis of the spontaneous mutation spectra for an RNA virus.
Collapse
Affiliation(s)
- Peter Pak-Hang Cheung
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894-6075, USA
| | - Ka-Tim Choy
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Hoi Yee Ng
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Joseph Sriyal Malik Peiris
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - Hui-Ling Yen
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| |
Collapse
|
30
|
Sonnhammer ELL, Gabaldón T, Sousa da Silva AW, Martin M, Robinson-Rechavi M, Boeckmann B, Thomas PD, Dessimoz C. Big data and other challenges in the quest for orthologs. Bioinformatics 2014; 30:2993-8. [PMID: 25064571 PMCID: PMC4201156 DOI: 10.1093/bioinformatics/btu492] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 06/25/2014] [Accepted: 07/16/2014] [Indexed: 01/29/2023] Open
Abstract
UNLABELLED Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION All such materials are available at http://questfororthologs.org.
Collapse
Affiliation(s)
- Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Toni Gabaldón
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Alan W Sousa da Silva
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Maria Martin
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Marc Robinson-Rechavi
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| | - Brigitte Boeckmann
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Paul D Thomas
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| | - Christophe Dessimoz
- Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
| |
Collapse
|
31
|
Pereira C, Denise A, Lespinet O. A meta-approach for improving the prediction and the functional annotation of ortholog groups. BMC Genomics 2014; 15 Suppl 6:S16. [PMID: 25573073 PMCID: PMC4240552 DOI: 10.1186/1471-2164-15-s6-s16] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In comparative genomics, orthologs are used to transfer annotation from genes already characterized to newly sequenced genomes. Many methods have been developed for finding orthologs in sets of genomes. However, the application of different methods on the same proteome set can lead to distinct orthology predictions. METHODS We developed a method based on a meta-approach that is able to combine the results of several methods for orthologous group prediction. The purpose of this method is to produce better quality results by using the overlapping results obtained from several individual orthologous gene prediction procedures. Our method proceeds in two steps. The first aims to construct seeds for groups of orthologous genes; these seeds correspond to the exact overlaps between the results of all or several methods. In the second step, these seed groups are expanded by using HMM profiles. RESULTS We evaluated our method on two standard reference benchmarks, OrthoBench and Orthology Benchmark Service. Our method presents a higher level of accurately predicted groups than the individual input methods of orthologous group prediction. Moreover, our method increases the number of annotated orthologous pairs without decreasing the annotation quality compared to twelve state-of-the-art methods. CONCLUSIONS The meta-approach based method appears to be a reliable procedure for predicting orthologous groups. Since a large number of methods for predicting groups of orthologous genes exist, it is quite conceivable to apply this meta-approach to several combinations of different methods.
Collapse
|
32
|
Complexity of gene expression evolution after duplication: protein dosage rebalancing. GENETICS RESEARCH INTERNATIONAL 2014; 2014:516508. [PMID: 25197576 PMCID: PMC4150538 DOI: 10.1155/2014/516508] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Accepted: 08/03/2014] [Indexed: 11/17/2022]
Abstract
Ongoing debates about functional importance of gene duplications have been recently intensified by a heated discussion of the “ortholog conjecture” (OC). Under the OC, which is central to functional annotation of genomes, orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of gene ontology (GO) annotations and expression profiles, among within-species paralogs compared to orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. Subsequent studies suggested that the OC appears to be generally valid when applied to mammalian evolution but the complete picture of evolution of gene expression also has to incorporate lineage-specific aspects of paralogy. The observed complexity of gene expression evolution after duplication can be explained through selection for gene dosage effect combined with the duplication-degeneration-complementation model. This paper discusses expression divergence of recent duplications occurring before functional divergence of proteins encoded by duplicate genes.
Collapse
|