1
|
González-Lozano KJ, Aréchiga-Carvajal ET, Jiménez-Salas Z, Valdez-Rodríguez DM, León-Ramírez CG, Ruiz-Herrera J, Adame-Rodríguez JM, López-Cabanillas-Lomelí M, Campos-Góngora E. Identification and Characterization of Dmct: A Cation Transporter in Yarrowia lipolytica Involved in Metal Tolerance. J Fungi (Basel) 2023; 9:600. [PMID: 37367535 DOI: 10.3390/jof9060600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 05/05/2023] [Accepted: 05/08/2023] [Indexed: 06/28/2023] Open
Abstract
Yarrowia lipolytica is a dimorphic fungus used as a model organism to investigate diverse biotechnological and biological processes, such as cell differentiation, heterologous protein production, and bioremediation strategies. However, little is known about the biological processes responsible for cation concentration homeostasis. Metals play pivotal roles in critical biochemical processes, and some are toxic at unbalanced intracellular concentrations. Membrane transport proteins control intracellular cation concentrations. Analysis of the Y. lipolytica genome revealed a characteristic functional domain of the cation efflux protein family, i.e., YALI0F19734g, which encodes YALI0F19734p (a putative Yl-Dmct protein), which is related to divalent metal cation tolerance. We report the in silico analysis of the putative Yl-Dmct protein's characteristics and the phenotypic response to divalent cations (Ca2+, Cu2+, Fe2+, and Zn2+) in the presence of mutant strains, Δdmct and Rdmct, constructed by deletion and reinsertion of the DMCT gene, respectively. The absence of the Yl-Dmct protein induces cellular and growth rate changes, as well as dimorphism differences, when calcium, copper, iron, and zinc are added to the cultured medium. Interestingly, the parental and mutant strains were able to internalize the ions. Our results suggest that the protein encoded by the DMCT gene is involved in cell development and cation homeostasis in Y. lipolytica.
Collapse
Affiliation(s)
- Katia Jamileth González-Lozano
- Universidad Autónoma de Nuevo León, Facultad de Ciencias Biológicas, Departamento de Microbiología, LMYF, Unidad de Manipulación Genética, Monterrey CP 66455, Nuevo León, Mexico
| | - Elva Teresa Aréchiga-Carvajal
- Universidad Autónoma de Nuevo León, Facultad de Ciencias Biológicas, Departamento de Microbiología, LMYF, Unidad de Manipulación Genética, Monterrey CP 66455, Nuevo León, Mexico
| | - Zacarías Jiménez-Salas
- Universidad Autónoma de Nuevo León, Centro de Investigación en Nutrición y Salud Pública, Monterrey CP 64460, Nuevo León, Mexico
| | - Debany Marlen Valdez-Rodríguez
- Universidad Autónoma de Nuevo León, Facultad de Ciencias Biológicas, Departamento de Microbiología, LMYF, Unidad de Manipulación Genética, Monterrey CP 66455, Nuevo León, Mexico
| | - Claudia Geraldine León-Ramírez
- Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Unidad Irapuato, Departamento de Ingeniería Genética, Irapuato CP 36824, Guanajuato, Mexico
| | - José Ruiz-Herrera
- Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional, Unidad Irapuato, Departamento de Ingeniería Genética, Irapuato CP 36824, Guanajuato, Mexico
| | - Juan Manuel Adame-Rodríguez
- Universidad Autónoma de Nuevo León, Facultad de Ciencias Biológicas, Departamento de Microbiología, LMYF, Unidad de Manipulación Genética, Monterrey CP 66455, Nuevo León, Mexico
| | - Manuel López-Cabanillas-Lomelí
- Universidad Autónoma de Nuevo León, Centro de Investigación en Nutrición y Salud Pública, Monterrey CP 64460, Nuevo León, Mexico
| | - Eduardo Campos-Góngora
- Universidad Autónoma de Nuevo León, Centro de Investigación en Nutrición y Salud Pública, Monterrey CP 64460, Nuevo León, Mexico
| |
Collapse
|
2
|
Engel SR, Wong ED, Nash RS, Aleksander S, Alexander M, Douglass E, Karra K, Miyasato SR, Simison M, Skrzypek MS, Weng S, Cherry JM. New data and collaborations at the Saccharomyces Genome Database: updated reference genome, alleles, and the Alliance of Genome Resources. Genetics 2022; 220:iyab224. [PMID: 34897464 PMCID: PMC9209811 DOI: 10.1093/genetics/iyab224] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 11/11/2021] [Indexed: 02/03/2023] Open
Abstract
Saccharomyces cerevisiae is used to provide fundamental understanding of eukaryotic genetics, gene product function, and cellular biological processes. Saccharomyces Genome Database (SGD) has been supporting the yeast research community since 1993, serving as its de facto hub. Over the years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation, and developed various tools and methods for analysis and curation of a variety of emerging data types. More recently, SGD and six other model organism focused knowledgebases have come together to create the Alliance of Genome Resources to develop sustainable genome information resources that promote and support the use of various model organisms to understand the genetic and genomic bases of human biology and disease. Here we describe recent activities at SGD, including the latest reference genome annotation update, the development of a curation system for mutant alleles, and new pages addressing homology across model organisms as well as the use of yeast to study human disease.
Collapse
Affiliation(s)
- Stacia R Engel
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Edith D Wong
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Robert S Nash
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Suzi Aleksander
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Micheal Alexander
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Eric Douglass
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Matt Simison
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Marek S Skrzypek
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA
| |
Collapse
|
3
|
Abstract
Determining the evolutionary relationships between genes is fundamental to comparative biological research. Here, we present SHOOT. SHOOT searches a user query sequence against a database of phylogenetic trees and returns a tree with the query sequence correctly placed within it. We show that SHOOT performs this analysis with comparable speed to a BLAST search. We demonstrate that SHOOT phylogenetic placements are as accurate as conventional tree inference, and it can identify orthologs with high accuracy. In summary, SHOOT is a fast and accurate tool for phylogenetic analyses of novel query sequences. It is available online at www.shoot.bio .
Collapse
Affiliation(s)
- David Mark Emms
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
4
|
Ilnitskiy IS, Zharikova AA, Mironov AA. OUP accepted manuscript. Nucleic Acids Res 2022; 50:W534-W540. [PMID: 35610035 PMCID: PMC9252792 DOI: 10.1093/nar/gkac385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/19/2022] [Accepted: 04/29/2022] [Indexed: 11/27/2022] Open
Abstract
Extensive amounts of data from next-generation sequencing and omics studies have led to the accumulation of information that provides insight into the evolutionary landscape of related proteins. Here, we present OrthoQuantum, a web server that allows for time-efficient analysis and visualization of phylogenetic profiles of any set of eukaryotic proteins. It is a simple-to-use tool capable of searching large input sets of proteins. Using data from open source databases of orthologous sequences in a wide range of taxonomic groups, it enables users to assess coupled evolutionary patterns and helps define lineage-specific innovations. The web interface allows to perform queries with gene names and UniProt identifiers in different phylogenetic clades and supplement presence with an additional BLAST search. The conservation patterns of proteins are coded as binary vectors, i.e., strings that encode the presence or absence of orthologous proteins in other genomes. These strings are used to calculate top-scoring correlation pairs needed for finding co-inherited proteins which are simultaneously present or simultaneously absent in specific lineages. Profiles are visualized in combination with phylogenetic trees in a JavaScript-based interface. The OrthoQuantum v1.0 web server is freely available at http://orthoq.bioinf.fbb.msu.ru along with documentation and tutorial.
Collapse
Affiliation(s)
| | - Anastasia A Zharikova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Lomonosovsky Prospect 27, Building 10, 119991 Moscow, Russia
- Kharkevich Institute of Information Transmission Problems, Russian Academy of Sciences, Big Karetny Lane 19, Building 1, 127051 Moscow, Russia
| | - Andrey A Mironov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Lomonosovsky Prospect 27, Building 10, 119991 Moscow, Russia
- Kharkevich Institute of Information Transmission Problems, Russian Academy of Sciences, Big Karetny Lane 19, Building 1, 127051 Moscow, Russia
| |
Collapse
|
5
|
Schall PZ, Latham KE. Cross-species meta-analysis of transcriptome changes during the morula-to-blastocyst transition: metabolic and physiological changes take center stage. Am J Physiol Cell Physiol 2021; 321:C913-C931. [PMID: 34669511 DOI: 10.1152/ajpcell.00318.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The morula-to-blastocyst transition (MBT) culminates with formation of inner cell mass (ICM) and trophectoderm (TE) lineages. Recent studies identified signaling pathways driving lineage specification, but some features of these pathways display significant species divergence. To better understand evolutionary conservation of the MBT, we completed a meta-analysis of RNA sequencing data from five model species and ICMTE differences from four species. Although many genes change in expression during the MBT within any given species, the number of shared differentially expressed genes (DEGs) is comparatively small, and the number of shared ICMTE DEGs is even smaller. DEGs related to known lineage determining pathways (e.g., POU5F1) are seen, but the most prominent pathways and functions associated with shared DEGs or shared across individual species DEG lists impact basic physiological and metabolic activities, such as TCA cycle, unfolded protein response, oxidative phosphorylation, sirtuin signaling, mitotic roles of polo-like kinases, NRF2-mediated oxidative stress, estrogen receptor signaling, apoptosis, necrosis, lipid and fatty acid metabolism, cholesterol biosynthesis, endocytosis, AMPK signaling, homeostasis, transcription, and cell death. We also observed prominent differences in transcriptome regulation between ungulates and nonungulates, particularly for ICM- and TE-enhanced mRNAs. These results extend our understanding of shared mechanisms of the MBT and formation of the ICM and TE and should better inform the selection of model species for particular applications.
Collapse
Affiliation(s)
- Peter Z Schall
- Department of Animal Science, Michigan State University, East Lansing, Michigan.,Reproductive and Developmental Sciences Program, Michigan State University, East Lansing, Michigan.,Comparative Medicine and Integrative Biology Program, Michigan State University, East Lansing, Michigan
| | - Keith E Latham
- Department of Animal Science, Michigan State University, East Lansing, Michigan.,Reproductive and Developmental Sciences Program, Michigan State University, East Lansing, Michigan.,Department of Obstetrics, Gynecology, & Reproductive Biology, Michigan State University, East Lansing, Michigan
| |
Collapse
|
6
|
Fuentes D, Molina M, Chorostecki U, Capella-Gutiérrez S, Marcet-Houben M, Gabaldón T. PhylomeDB V5: an expanding repository for genome-wide catalogues of annotated gene phylogenies. Nucleic Acids Res 2021; 50:D1062-D1068. [PMID: 34718760 PMCID: PMC8728271 DOI: 10.1093/nar/gkab966] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/02/2021] [Accepted: 10/05/2021] [Indexed: 12/20/2022] Open
Abstract
PhylomeDB is a unique knowledge base providing public access to minable and browsable catalogues of pre-computed genome-wide collections of annotated sequences, alignments and phylogenies (i.e. phylomes) of homologous genes, as well as to their corresponding phylogeny-based orthology and paralogy relationships. In addition, PhylomeDB trees and alignments can be downloaded for further processing to detect and date gene duplication events, infer past events of inter-species hybridization and horizontal gene transfer, as well as to uncover footprints of selection, introgression, gene conversion, or other relevant evolutionary processes in the genes and organisms of interest. Here, we describe the latest evolution of PhylomeDB (version 5). This new version includes a newly implemented web interface and several new functionalities such as optimized searching procedures, the possibility to create user-defined phylome collections, and a fully redesigned data structure. This release also represents a significant core data expansion, with the database providing access to 534 phylomes, comprising over 8 million trees, and homology relationships for genes in over 6000 species. This makes PhylomeDB the largest and most comprehensive public repository of gene phylogenies. PhylomeDB is available at http://www.phylomedb.org.
Collapse
Affiliation(s)
- Diego Fuentes
- Barcelona Supercomputing Centre (BSC-CNS). Jordi Girona 29, 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028 Barcelona, Spain
| | - Manuel Molina
- Barcelona Supercomputing Centre (BSC-CNS). Jordi Girona 29, 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028 Barcelona, Spain
| | - Uciel Chorostecki
- Barcelona Supercomputing Centre (BSC-CNS). Jordi Girona 29, 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028 Barcelona, Spain
| | | | - Marina Marcet-Houben
- Barcelona Supercomputing Centre (BSC-CNS). Jordi Girona 29, 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028 Barcelona, Spain
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS). Jordi Girona 29, 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac 10, 08028 Barcelona, Spain.,Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
| |
Collapse
|
7
|
Foley S, Ku C, Arshinoff B, Lotay V, Karimi K, Vize PD, Hinman V. Integration of 1:1 orthology maps and updated datasets into Echinobase. Database (Oxford) 2021; 2021:baab030. [PMID: 34010390 PMCID: PMC8132956 DOI: 10.1093/database/baab030] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 04/23/2021] [Accepted: 04/30/2021] [Indexed: 12/24/2022]
Abstract
Echinobase (https://echinobase.org) is a central online platform that generates, manages and hosts genomic data relevant to echinoderm research. While the resource primarily serves the echinoderm research community, the recent release of an excellent quality genome for the frequently studied purple sea urchin (Strongylocentrotus purpuratus genome, v5.0) has provided an opportunity to adapt to the needs of a broader research community across other model systems. To this end, establishing pipelines to identify orthologous genes between echinoderms and other species has become a priority in many contexts including nomenclature, linking to data in other model organisms, and in internal functionality where data gathered in one hosted species can be associated with genes in other hosted echinoderms. This paper describes the orthology pipelines currently employed by Echinobase and how orthology data are processed to yield 1:1 ortholog mappings between a variety of echinoderms and other model taxa. We also describe functions of interest that have recently been included on the resource, including an updated developmental time course for S.purpuratus, and additional tracks for genome browsing. These data enhancements will increase the accessibility of the resource to non-echinoderm researchers and simultaneously expand the data quality and quantity available to core Echinobase users. Database URL: https://echinobase.org.
Collapse
Affiliation(s)
- Saoirse Foley
- Department of Biological Sciences, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
- Echinobase #6-46, Mellon Institute, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Carolyn Ku
- Department of Biological Sciences, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
- Echinobase #6-46, Mellon Institute, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Brad Arshinoff
- Department of Biological Sciences, University of Calgary, 2500 University Drive NW, Calgary, Alberta TN2 1N4, Canada
| | - Vaneet Lotay
- Department of Biological Sciences, University of Calgary, 2500 University Drive NW, Calgary, Alberta TN2 1N4, Canada
| | - Kamran Karimi
- Department of Biological Sciences, University of Calgary, 2500 University Drive NW, Calgary, Alberta TN2 1N4, Canada
| | - Peter D Vize
- Department of Biological Sciences, University of Calgary, 2500 University Drive NW, Calgary, Alberta TN2 1N4, Canada
| | - Veronica Hinman
- Department of Biological Sciences, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
- Echinobase #6-46, Mellon Institute, 4400 Fifth Avenue, Pittsburgh, PA 15213, USA
| |
Collapse
|
8
|
Chorostecki U, Molina M, Pryszcz LP, Gabaldón T. MetaPhOrs 2.0: integrative, phylogeny-based inference of orthology and paralogy across the tree of life. Nucleic Acids Res 2020; 48:W553-W557. [PMID: 32343307 PMCID: PMC7319458 DOI: 10.1093/nar/gkaa282] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 04/01/2020] [Accepted: 04/25/2020] [Indexed: 12/23/2022] Open
Abstract
Inferring homology relationships across genes in different species is a central task in comparative genomics. Therefore, a large number of resources and methods have been developed over the years. Some public databases include phylogenetic trees of homologous gene families which can be used to further differentiate homology relationships into orthology and paralogy. MetaPhOrs is a web server that integrates phylogenetic information from different sources to provide orthology and paralogy relationships based on a common phylogeny-based predictive algorithm and associated with a consistency-based confidence score. Here we describe the latest version of the web server which includes major new implementations and provides orthology and paralogy relationships derived from ∼8.2 million gene family trees-from 13 different source repositories across ∼4000 species with sequenced genomes. MetaPhOrs server is freely available, without registration, at http://orthology.phylomedb.org/.
Collapse
Affiliation(s)
- Uciel Chorostecki
- Barcelona Supercomputing Centre (BSC-CNS), 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Manuel Molina
- Barcelona Supercomputing Centre (BSC-CNS), 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| | - Leszek P Pryszcz
- Centre for Genomic Regulation, 08003 Barcelona, Spain.,International Institute of Molecular and Cell Biology, 4 Ks. Trojdena Street, 02-109 Warsaw, Poland
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS), 08034 Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain.,ICREA, 08010 Barcelona, Spain
| |
Collapse
|
9
|
Marchant A, Cisneros AF, Dubé AK, Gagnon-Arsenault I, Ascencio D, Jain H, Aubé S, Eberlein C, Evans-Yamamoto D, Yachie N, Landry CR. The role of structural pleiotropy and regulatory evolution in the retention of heteromers of paralogs. eLife 2019; 8:46754. [PMID: 31454312 PMCID: PMC6711710 DOI: 10.7554/elife.46754] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 08/11/2019] [Indexed: 01/07/2023] Open
Abstract
Gene duplication is a driver of the evolution of new functions. The duplication of genes encoding homomeric proteins leads to the formation of homomers and heteromers of paralogs, creating new complexes after a single duplication event. The loss of these heteromers may be required for the two paralogs to evolve independent functions. Using yeast as a model, we find that heteromerization is frequent among duplicated homomers and correlates with functional similarity between paralogs. Using in silico evolution, we show that for homomers and heteromers sharing binding interfaces, mutations in one paralog can have structural pleiotropic effects on both interactions, resulting in highly correlated responses of the complexes to selection. Therefore, heteromerization could be preserved indirectly due to selection for the maintenance of homomers, thus slowing down functional divergence between paralogs. We suggest that paralogs can overcome the obstacle of structural pleiotropy by regulatory evolution at the transcriptional and post-translational levels.
Collapse
Affiliation(s)
- Axelle Marchant
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Angel F Cisneros
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada
| | - Alexandre K Dubé
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Isabelle Gagnon-Arsenault
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Diana Ascencio
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Honey Jain
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Department of Biological Sciences, Birla Institute of Technology and Sciences, Pilani, India
| | - Simon Aubé
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada
| | - Chris Eberlein
- PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| | - Daniel Evans-Yamamoto
- Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Graduate School of Media and Governance, Keio University, Fujisawa, Japan
| | - Nozomu Yachie
- Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan.,Graduate School of Media and Governance, Keio University, Fujisawa, Japan.,Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, Japan
| | - Christian R Landry
- Département de biochimie, de microbiologie et de bio-informatique, Université Laval, Québec, Canada.,PROTEO, le réseau québécois de recherche sur la fonction, la structure et l'ingénierie des protéines, Université Laval, Québec, Canada.,Centre de Recherche en Données Massives (CRDM), Université Laval, Québec, Canada.,Département de biologie, Université Laval, Québec, Canada
| |
Collapse
|
10
|
Puigbò P, Wolf YI, Koonin EV. Genome-Wide Comparative Analysis of Phylogenetic Trees: The Prokaryotic Forest of Life. Methods Mol Biol 2019; 1910:241-269. [PMID: 31278667 DOI: 10.1007/978-1-4939-9074-0_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the boot-split distance (BSD) method is introduced as an extension of the previously developed split distance (SD) method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting treelike and netlike evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the applications methods used to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a "species tree."
Collapse
Affiliation(s)
- Pere Puigbò
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.,Division of Genetics and Physiology, Department of Biology, University of Turku, Turku, Finland
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
11
|
Abstract
Predicting mitochondrial localization of proteins remains challenging for two main reasons: (1) Not only one but several mitochondrial localization signals exist, which primarily dictate the final destination of a protein in this organelle. However, most localization prediction algorithms rely on the presence of a so-called presequence (or N-terminal mitochondrial targeting peptide, mTP), which occurs in only ~70% of mitochondrial proteins. (2) The presequence is highly divergent on sequence level and therefore difficult to identify on the computer.In this chapter, we review a number of protein localization prediction programs and propose a strategy to predict mitochondrial localization. Finally, we give some helpful suggestions for bench scientists when working with mitochondrial protein candidates in silico.
Collapse
|
12
|
Rane RV, Oakeshott JG, Nguyen T, Hoffmann AA, Lee SF. Orthonome - a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes. BMC Genomics 2017; 18:673. [PMID: 28859620 PMCID: PMC5580312 DOI: 10.1186/s12864-017-4079-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Accepted: 08/21/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Distinguishing orthologous and paralogous relationships between genes across multiple species is essential for comparative genomic analyses. Various computational approaches have been developed to resolve these evolutionary relationships, but strong trade-offs between precision and recall of orthologue prediction remains an ongoing challenge. RESULTS Here we present Orthonome, an orthologue prediction pipeline, designed to reduce the trade-off between orthologue capture rates (recall) and accuracy of multi-species orthologue prediction. The pipeline compares sequence domains and then forms sequence-similar clusters before using phylogenetic comparisons to identify inparalogues. It then corrects sequence similarity metrics for fragment and gene length bias using a novel scoring metric capturing relationships between full length as well as fragmented genes. The remaining genes are then brought together for the identification of orthologues within a phylogenetic framework. The orthologue predictions are further calibrated along with inparalogues and gene births, using synteny, to identify novel orthologous relationships. We use 12 high quality Drosophila genomes to show that, compared to other orthologue prediction pipelines, Orthonome provides orthogroups with minimal error but high recall. Furthermore, Orthonome is resilient to suboptimal assembly/annotation quality, with the inclusion of draft genomes from eight additional Drosophila species still providing >6500 1:1 orthologues across all twenty species while retaining a better combination of accuracy and recall than other pipelines. Orthonome is implemented as a searchable database and query tool along with multiple-sequence alignment browsers for all sets of orthologues. The underlying documentation and database are accessible at http://www.orthonome.com . CONCLUSION We demonstrate that Orthonome provides a superior combination of orthologue capture rates and accuracy on complete and draft drosophilid genomes when tested alongside previously published pipelines. The study also highlights a greater degree of evolutionary conservation across drosophilid species than earlier thought.
Collapse
Affiliation(s)
- Rahul V Rane
- Bio21 Institute, School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia. .,CSIRO, Canberra, Australian Capital Territory, Australia.
| | | | - Thu Nguyen
- Bio21 Institute, School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Ary A Hoffmann
- Bio21 Institute, School of Biosciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Siu F Lee
- CSIRO, Canberra, Australian Capital Territory, Australia.,Department of Biological Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
13
|
Abstract
Correctly estimating the age of a gene or gene family is important for a variety of fields, including molecular evolution, comparative genomics, and phylogenetics, and increasingly for systems biology and disease genetics. However, most studies use only a point estimate of a gene’s age, neglecting the substantial uncertainty involved in this estimation. Here, we characterize this uncertainty by investigating the effect of algorithm choice on gene-age inference and calculate consensus gene ages with attendant error distributions for a variety of model eukaryotes. We use 13 orthology inference algorithms to create gene-age datasets and then characterize the error around each age-call on a per-gene and per-algorithm basis. Systematic error was found to be a large factor in estimating gene age, suggesting that simple consensus algorithms are not enough to give a reliable point estimate. We also found that different sources of error can affect downstream analyses, such as gene ontology enrichment. Our consensus gene-age datasets, with associated error terms, are made fully available at so that researchers can propagate this uncertainty through their analyses (geneages.org).
Collapse
Affiliation(s)
- Benjamin J Liebeskind
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin Center for Computational Biology and Bioinformatics, University of Texas at Austin
| | - Claire D McWhite
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin
| |
Collapse
|
14
|
Lustig AJ. Hypothesis: Paralog Formation from Progenitor Proteins and Paralog Mutagenesis Spur the Rapid Evolution of Telomere Binding Proteins. Front Genet 2016; 7:10. [PMID: 26904098 PMCID: PMC4748036 DOI: 10.3389/fgene.2016.00010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 01/22/2016] [Indexed: 12/31/2022] Open
Abstract
Through elegant studies in fungal cells and complex organisms, we propose a unifying paradigm for the rapid evolution of telomere binding proteins (TBPs) that associate with either (or both) telomeric DNA and telomeric proteins. TBPs protect and regulate telomere structure and function. Four critical factors are involved. First, TBPs that commonly bind to telomeric DNA include the c-Myb binding proteins, OB-fold single-stranded binding proteins, and G-G base paired Hoogsteen structure (G4) binding proteins. Each contributes independently or, in some cases, cooperatively, to provide a minimum level of telomere function. As a result of these minimal requirements and the great abundance of homologs of these motifs in the proteome, DNA telomere-binding activity may be generated more easily than expected. Second, telomere dysfunction gives rise to genome instability, through the elevation of recombination rates, genome ploidy, and the frequency of gene mutations. The formation of paralogs that diverge from their progenitor proteins ultimately can form a high frequency of altered TBPs with altered functions. Third, TBPs that assemble into complexes (e.g., mammalian shelterin) derive benefits from the novel emergent functions. Fourth, a limiting factor in the evolution of TBP complexes is the formation of mutually compatible interaction surfaces amongst the TBPs. These factors may have different degrees of importance in the evolution of different phyla, illustrated by the apparently simpler telomeres in complex plants. Selective pressures that can utilize the mechanisms of paralog formation and mutagenesis to drive TBP evolution along routes dependent on the requisite physiologic changes.
Collapse
Affiliation(s)
- Arthur J Lustig
- Department of Biochemistry and Molecular Biology, Tulane University, New Orleans LA, USA
| |
Collapse
|
15
|
Horiike T, Minai R, Miyata D, Nakamura Y, Tateno Y. Ortholog-Finder: A Tool for Constructing an Ortholog Data Set. Genome Biol Evol 2016; 8:446-57. [PMID: 26782935 PMCID: PMC4779612 DOI: 10.1093/gbe/evw005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Orthologs are widely used for phylogenetic analysis of species; however, identifying genuine orthologs among distantly related species is challenging, because genes obtained through horizontal gene transfer (HGT) and out-paralogs derived from gene duplication before speciation are often present among the predicted orthologs. We developed a program, “Ortholog-Finder,” to obtain ortholog data sets for performing phylogenetic analysis by using all open-reading frame data of species. The program includes five processes for minimizing the effects of HGT and out-paralogs in phylogeny construction: 1) HGT filtering: Genes derived from HGT could be detected and deleted from the initial sequence data set by examining their base compositions. 2) Out-paralog filtering: Out-paralogs are detected and deleted from the data set based on sequence similarity. 3) Classification of phylogenetic trees: Phylogenetic trees generated for ortholog candidates are classified as monophyletic or polyphyletic trees. 4) Tree splitting: Polyphyletic trees are bisected to obtain monophyletic trees and remove HGT genes and out-paralogs. 5) Threshold changing: Out-paralogs are further excluded from the data set based on the difference in the similarity scores of genuine orthologs and out-paralogs. We examined how out-paralogs and HGTs affected phylogenetic trees constructed for species based on ortholog data sets obtained by Ortholog-Finder with the use of simulation data, and we determined the effects of confounding factors. We then used Ortholog-Finder in phylogeny construction for 12 Gram-positive bacteria from two phyla and validated each node of the constructed tree by comparison with individually constructed ortholog trees.
Collapse
Affiliation(s)
- Tokumasa Horiike
- Department of Biological and Environmental Science, Shizuoka University, Japan
| | - Ryoichi Minai
- The Genome Institute, Japanese Foundation of Cancer Research, Tokyo, Japan
| | - Daisuke Miyata
- Department of Economics, Chiba University of Commerce, Ichikawa, Japan
| | - Yoji Nakamura
- Research Center for Aquatic Genomics, National Research Institute of Fisheries Science, Fisheries Research Agency, Kanagawa, Japan
| | - Yoshio Tateno
- School of New Sciences, Daegu Gyoungbook Institute of Science and Technology, Daegu, Republic of Korea
| |
Collapse
|
16
|
Oh Brother, Where Art Thou? Finding Orthologs in the Twilight and Midnight Zones of Sequence Similarity. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
17
|
García-Rodas R, Trevijano-Contador N, Román E, Janbon G, Moyrand F, Pla J, Casadevall A, Zaragoza O. Role of Cln1 during melanization of Cryptococcus neoformans. Front Microbiol 2015; 6:798. [PMID: 26322026 PMCID: PMC4532930 DOI: 10.3389/fmicb.2015.00798] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2015] [Accepted: 07/22/2015] [Indexed: 11/24/2022] Open
Abstract
Cryptococcus neoformans is an opportunistic fungal pathogen that has several well-described virulence determinants. A polysaccharide capsule and the ability to produce melanin are among the most important. Melanization occurs both in vitro, in the presence of catecholamine and indole compounds, and in vivo during the infection. Despite the importance of melanin production for cryptococcal virulence, the component and mechanisms involved in its synthesis have not been fully elucidated. In this work, we describe the role of a G1/S cyclin (Cln1) in the melanization process. Cln1 has evolved specifically with proteins present only in other basidiomycetes. We found that Cln1 is required for the cell wall stability and production of melanin in C. neoformans. Absence of melanization correlated with a defect in the expression of the LAC1 gene. The relation between cell cycle elements and melanization was confirmed by the effect of drugs that cause cell cycle arrest at a specific phase, such as rapamycin. The cln1 mutant was consistently more susceptible to oxidative damage in a medium that induces melanization. Our results strongly suggest a novel and hitherto unrecognized role for C. neoformans Cln1 in the expression of virulence traits.
Collapse
Affiliation(s)
- Rocío García-Rodas
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III Majadahonda, Spain
| | - Nuria Trevijano-Contador
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III Majadahonda, Spain
| | - Elvira Román
- Department of Microbiology, Pharmacy Faculty, Complutense University of Madrid Madrid, Spain
| | - Guilhem Janbon
- Unité Biologie et Pathogénicité Fongiques, Institut Pasteur Paris, France
| | - Frédérique Moyrand
- Unité Biologie et Pathogénicité Fongiques, Institut Pasteur Paris, France
| | - Jesús Pla
- Department of Microbiology, Pharmacy Faculty, Complutense University of Madrid Madrid, Spain
| | - Arturo Casadevall
- Department of Molecular Microbiology and Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore MD, USA
| | - Oscar Zaragoza
- Mycology Reference Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III Majadahonda, Spain
| |
Collapse
|
18
|
Genome-Wide Collation of the Plasmodium falciparum WDR Protein Superfamily Reveals Malarial Parasite-Specific Features. PLoS One 2015; 10:e0128507. [PMID: 26043001 PMCID: PMC4456382 DOI: 10.1371/journal.pone.0128507] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Accepted: 04/29/2015] [Indexed: 01/10/2023] Open
Abstract
Despite a significant drop in malaria deaths during the past decade, malaria continues to be one of the biggest health problems around the globe. WD40 repeats (WDRs) containing proteins comprise one of the largest and functionally diverse protein superfamily in eukaryotes, acting as scaffolds for assembling large protein complexes. In the present study, we report an extensive in silico analysis of the WDR gene family in human malaria parasite Plasmodium falciparum. Our genome-wide identification has revealed 80 putative WDR genes in P. falciparum (PfWDRs). Five distinct domain compositions were discovered in Plasmodium as compared to the human host. Notably, 31 PfWDRs were annotated/re-annotated on the basis of their orthologs in other species. Interestingly, most PfWDRs were larger as compared to their human homologs highlighting the presence of parasite-specific insertions. Fifteen PfWDRs appeared specific to the Plasmodium with no assigned orthologs. Expression profiling of PfWDRs revealed a mixture of linear and nonlinear relationships between transcriptome and proteome, and only nine PfWDRs were found to be stage-specific. Homology modeling identified conservation of major binding sites in PfCAF-1 and PfRACK. Protein-protein interaction network analyses suggested that PfWDRs are highly connected proteins with ~1928 potential interactions, supporting their role as hubs in cellular networks. The present study highlights the roles and relevance of the WDR family in P. falciparum, and identifies unique features that lay a foundation for further experimental dissection of PfWDRs.
Collapse
|
19
|
Green RE, Braun EL, Armstrong J, Earl D, Nguyen N, Hickey G, Vandewege MW, St John JA, Capella-Gutiérrez S, Castoe TA, Kern C, Fujita MK, Opazo JC, Jurka J, Kojima KK, Caballero J, Hubley RM, Smit AF, Platt RN, Lavoie CA, Ramakodi MP, Finger JW, Suh A, Isberg SR, Miles L, Chong AY, Jaratlerdsiri W, Gongora J, Moran C, Iriarte A, McCormack J, Burgess SC, Edwards SV, Lyons E, Williams C, Breen M, Howard JT, Gresham CR, Peterson DG, Schmitz J, Pollock DD, Haussler D, Triplett EW, Zhang G, Irie N, Jarvis ED, Brochu CA, Schmidt CJ, McCarthy FM, Faircloth BC, Hoffmann FG, Glenn TC, Gabaldón T, Paten B, Ray DA. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs. Science 2014; 346:1254449. [PMID: 25504731 PMCID: PMC4386873 DOI: 10.1126/science.1254449] [Citation(s) in RCA: 239] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs.
Collapse
Affiliation(s)
- Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA.
| | - Edward L Braun
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - Joel Armstrong
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA. Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Dent Earl
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA. Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Ngan Nguyen
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA. Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Glenn Hickey
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA. Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Michael W Vandewege
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA
| | - John A St John
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Salvador Capella-Gutiérrez
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation, 08003 Barcelona, Spain. Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Todd A Castoe
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA. Department of Biology, University of Texas, Arlington, TX 76019, USA
| | - Colin Kern
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19717, USA
| | - Matthew K Fujita
- Department of Biology, University of Texas, Arlington, TX 76019, USA
| | - Juan C Opazo
- Instituto de Ciencias Ambientales y Evolutivas, Facultad de Ciencias, Universidad Austral de Chile, Valdivia, Chile
| | - Jerzy Jurka
- Genetic Information Research Institute, Mountain View, CA 94043, USA
| | - Kenji K Kojima
- Genetic Information Research Institute, Mountain View, CA 94043, USA
| | | | | | - Arian F Smit
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Roy N Platt
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA. Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA
| | - Christine A Lavoie
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA
| | - Meganathan P Ramakodi
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA. Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA
| | - John W Finger
- Department of Environmental Health Science, University of Georgia, Athens, GA 30602, USA
| | - Alexander Suh
- Institute of Experimental Pathology (ZMBE), University of Münster, D-48149 Münster, Germany. Department of Evolutionary Biology (EBC), Uppsala University, SE-752 36 Uppsala, Sweden
| | - Sally R Isberg
- Porosus Pty. Ltd., Palmerston, NT 0831, Australia. Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia. Centre for Crocodile Research, Noonamah, NT 0837, Australia
| | - Lee Miles
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
| | - Amanda Y Chong
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
| | | | - Jaime Gongora
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
| | - Christopher Moran
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW 2006, Australia
| | - Andrés Iriarte
- Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - John McCormack
- Moore Laboratory of Zoology, Occidental College, Los Angeles, CA 90041, USA
| | - Shane C Burgess
- College of Agriculture and Life Sciences, University of Arizona, Tucson, AZ 85721, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Eric Lyons
- School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | - Christina Williams
- Department of Molecular Biomedical Sciences, North Carolina State University, Raleigh, NC 27607, USA
| | - Matthew Breen
- Department of Molecular Biomedical Sciences, North Carolina State University, Raleigh, NC 27607, USA
| | - Jason T Howard
- Howard Hughes Medical Institute, Department of Neurobiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Cathy R Gresham
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA
| | - Daniel G Peterson
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA. Department of Plant and Soil Sciences, Mississippi State University, Mississippi State, MS 39762, USA
| | - Jürgen Schmitz
- Institute of Experimental Pathology (ZMBE), University of Münster, D-48149 Münster, Germany
| | - David D Pollock
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA. Howard Hughes Medical Institute, Bethesda, MD 20814, USA
| | - Eric W Triplett
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen, China. Center for Social Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Naoki Irie
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, Japan
| | - Erich D Jarvis
- Howard Hughes Medical Institute, Department of Neurobiology, Duke University Medical Center, Durham, NC 27710, USA
| | - Christopher A Brochu
- Department of Earth and Environmental Sciences, University of Iowa, Iowa City, IA 52242, USA
| | - Carl J Schmidt
- Department of Animal and Food Sciences, University of Delaware, Newark, DE 19717, USA
| | - Fiona M McCarthy
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ 85721, USA
| | - Brant C Faircloth
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90019, USA. Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Federico G Hoffmann
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA. Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA
| | - Travis C Glenn
- Department of Environmental Health Science, University of Georgia, Athens, GA 30602, USA
| | - Toni Gabaldón
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation, 08003 Barcelona, Spain. Universitat Pompeu Fabra, 08003 Barcelona, Spain. Institució Catalana de Recerca i Estudis Avançats, 08010 Barcelona, Spain
| | - Benedict Paten
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA
| | - David A Ray
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA. Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA. Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA.
| |
Collapse
|
20
|
Sharma AR, Chakraborty C, Lee SS, Sharma G, Yoon JK, George Priya Doss C, Song DK, Nam JS. Computational biophysical, biochemical, and evolutionary signature of human R-spondin family proteins, the member of canonical Wnt/β-catenin signaling pathway. BIOMED RESEARCH INTERNATIONAL 2014; 2014:974316. [PMID: 25276837 PMCID: PMC4172882 DOI: 10.1155/2014/974316] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Revised: 07/12/2014] [Accepted: 07/12/2014] [Indexed: 12/27/2022]
Abstract
In human, Wnt/β-catenin signaling pathway plays a significant role in cell growth, cell development, and disease pathogenesis. Four human (Rspo)s are known to activate canonical Wnt/β-catenin signaling pathway. Presently, (Rspo)s serve as therapeutic target for several human diseases. Henceforth, basic understanding about the molecular properties of (Rspo)s is essential. We approached this issue by interpreting the biochemical and biophysical properties along with molecular evolution of (Rspo)s thorough computational algorithm methods. Our analysis shows that signal peptide length is roughly similar in (Rspo)s family along with similarity in aa distribution pattern. In Rspo3, four N-glycosylation sites were noted. All members are hydrophilic in nature and showed alike GRAVY values, approximately. Conversely, Rspo3 contains the maximum positively charged residues while Rspo4 includes the lowest. Four highly aligned blocks were recorded through Gblocks. Phylogenetic analysis shows Rspo4 is being rooted with Rspo2 and similarly Rspo3 and Rspo1 have the common point of origin. Through phylogenomics study, we developed a phylogenetic tree of sixty proteins (n = 60) with the orthologs and paralogs seed sequences. Protein-protein network was also illustrated. Results demonstrated in our study may help the future researchers to unfold significant physiological and therapeutic properties of (Rspo)s in various disease models.
Collapse
Affiliation(s)
- Ashish Ranjan Sharma
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University Hospital, College of Medicine, Chuncheon-si, Gangwon-do 200-704, Republic of Korea
| | - Chiranjib Chakraborty
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
- Department of Bioinformatics, School of Computer Sciences, Galgotias University, Greater Noida 203201, India
| | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| | - Garima Sharma
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| | - Jeong Kyo Yoon
- Center for Molecular Medicine, Maine Medial Center Research Institute, 81 Research Drive, Scarborough, ME 04074, USA
| | - C. George Priya Doss
- Medical Biotechnology Division, School of Biosciences and Technology, VIT University, Vellore, Tamil Nadu 632014, India
| | - Dong-Keun Song
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| | - Ju-Suk Nam
- Institute for Skeletal Aging & Orthopedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon 200704, Republic of Korea
| |
Collapse
|
21
|
Chiba H, Uchiyama I. Improvement of domain-level ortholog clustering by optimizing domain-specific sum-of-pairs score. BMC Bioinformatics 2014; 15:148. [PMID: 24885064 PMCID: PMC4035852 DOI: 10.1186/1471-2105-15-148] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2013] [Accepted: 05/06/2014] [Indexed: 01/11/2023] Open
Abstract
Background Identification of ortholog groups is a crucial step in comparative analysis of multiple genomes. Although several computational methods have been developed to create ortholog groups, most of those methods do not evaluate orthology at the sub-gene level. In our method for domain-level ortholog clustering, DomClust, proteins are split into domains on the basis of alignment boundaries identified by all-against-all pairwise comparison, but it often fails to determine appropriate boundaries. Results We developed a method to improve domain-level ortholog classification using multiple alignment information. This method is based on a scoring scheme, the domain-specific sum-of-pairs (DSP) score, which evaluates ortholog clustering results at the domain level as the sum total of domain-level alignment scores. We developed a refinement pipeline to improve domain-level clustering, DomRefine, by optimizing the DSP score. We applied DomRefine to domain-level ortholog groups created by DomClust using a dataset obtained from the Microbial Genome Database for Comparative Analysis (MBGD), and evaluated the results using COG clusters and TIGRFAMs models as the reference data. Thus, we observed that the agreement between the resulting classification and the classifications in the reference databases is improved at almost every step in the refinement pipeline. Moreover, the refined classification showed better agreement than the classifications in the eggNOG databases when TIGRFAMs was used as the reference database. Conclusions DomRefine is a useful tool for improving the quality of domain-level ortholog classification among microbial genomes. Combining with a rapid domain-level ortholog clustering method, such as DomClust, it can be used to create a high-quality ortholog database that can serve as a solid basis for various comparative genome analyses.
Collapse
Affiliation(s)
| | - Ikuo Uchiyama
- National Institute for Basic Biology, National Institutes of Natural Sciences, Nishigonaka 38, Myodaiji, Okazaki 444-8585, Japan.
| |
Collapse
|
22
|
Huerta-Cepas J, Capella-Gutiérrez S, Pryszcz LP, Marcet-Houben M, Gabaldón T. PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome. Nucleic Acids Res 2013; 42:D897-902. [PMID: 24275491 PMCID: PMC3964985 DOI: 10.1093/nar/gkt1177] [Citation(s) in RCA: 190] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Phylogenetic trees representing the evolutionary relationships of homologous genes are the entry point for many evolutionary analyses. For instance, the use of a phylogenetic tree can aid in the inference of orthology and paralogy relationships, and in the detection of relevant evolutionary events such as gene family expansions and contractions, horizontal gene transfer, recombination or incomplete lineage sorting. Similarly, given the plurality of evolutionary histories among genes encoded in a given genome, there is a need for the combined analysis of genome-wide collections of phylogenetic trees (phylomes). Here, we introduce a new release of PhylomeDB (http://phylomedb.org), a public repository of phylomes. Currently, PhylomeDB hosts 120 public phylomes, comprising >1.5 million maximum likelihood trees and multiple sequence alignments. In the current release, phylogenetic trees are annotated with taxonomic, protein-domain arrangement, functional and evolutionary information. PhylomeDB is also a major source for phylogeny-based predictions of orthology and paralogy, covering >10 million proteins across 1059 sequenced species. Here we describe newly implemented PhylomeDB features, and discuss a benchmark of the orthology predictions provided by the database, the impact of proteome updates and the use of the phylome approach in the analysis of newly sequenced genomes and transcriptomes.
Collapse
Affiliation(s)
- Jaime Huerta-Cepas
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader, 88. 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain and Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain
| | | | | | | | | |
Collapse
|
23
|
Rabatel A, Febvay G, Gaget K, Duport G, Baa-Puyoulet P, Sapountzis P, Bendridi N, Rey M, Rahbé Y, Charles H, Calevro F, Colella S. Tyrosine pathway regulation is host-mediated in the pea aphid symbiosis during late embryonic and early larval development. BMC Genomics 2013; 14:235. [PMID: 23575215 PMCID: PMC3660198 DOI: 10.1186/1471-2164-14-235] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2012] [Accepted: 03/14/2013] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Nutritional symbioses play a central role in insects' adaptation to specialized diets and in their evolutionary success. The obligatory symbiosis between the pea aphid, Acyrthosiphon pisum, and the bacterium, Buchnera aphidicola, is no exception as it enables this important agricultural pest insect to develop on a diet exclusively based on plant phloem sap. The symbiotic bacteria provide the host with essential amino acids lacking in its diet but necessary for the rapid embryonic growth seen in the parthenogenetic viviparous reproduction of aphids. The aphid furnishes, in exchange, non-essential amino acids and other important metabolites. Understanding the regulations acting on this integrated metabolic system during the development of this insect is essential in elucidating aphid biology. RESULTS We used a microarray-based approach to analyse gene expression in the late embryonic and the early larval stages of the pea aphid, characterizing, for the first time, the transcriptional profiles in these developmental phases. Our analyses allowed us to identify key genes in the phenylalanine, tyrosine and dopamine pathways and we identified ACYPI004243, one of the four genes encoding for the aspartate transaminase (E.C. 2.6.1.1), as specifically regulated during development. Indeed, the tyrosine biosynthetic pathway is crucial for the symbiotic metabolism as it is shared between the two partners, all the precursors being produced by B. aphidicola. Our microarray data are supported by HPLC amino acid analyses demonstrating an accumulation of tyrosine at the same developmental stages, with an up-regulation of the tyrosine biosynthetic genes. Tyrosine is also essential for the synthesis of cuticular proteins and it is an important precursor for cuticle maturation: together with the up-regulation of tyrosine biosynthesis, we observed an up-regulation of cuticular genes expression. We were also able to identify some amino acid transporter genes which are essential for the switch over to the late embryonic stages in pea aphid development. CONCLUSIONS Our data show that, in the development of A. pisum, a specific host gene set regulates the biosynthetic pathways of amino acids, demonstrating how the regulation of gene expression enables an insect to control the production of metabolites crucial for its own development and symbiotic metabolism.
Collapse
Affiliation(s)
- Andréane Rabatel
- Insa-Lyon, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
| | - Gérard Febvay
- Inra, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
| | - Karen Gaget
- Inra, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
| | - Gabrielle Duport
- Inra, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
| | - Patrice Baa-Puyoulet
- Inra, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
| | - Panagiotis Sapountzis
- Inra, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
| | - Nadia Bendridi
- Insa-Lyon, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
| | - Marjolaine Rey
- Inra, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
| | - Yvan Rahbé
- Inra, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
- Inria Rhône-Alpes, Bamboo, Monbonnot Saint-Martin, F-38330, France
| | - Hubert Charles
- Insa-Lyon, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
- Inria Rhône-Alpes, Bamboo, Monbonnot Saint-Martin, F-38330, France
| | - Federica Calevro
- Insa-Lyon, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
| | - Stefano Colella
- Inra, UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, Villeurbanne, F-69621, France
- Université de Lyon, Lyon, F-69000, France
| |
Collapse
|
24
|
Martyanov V, Gross RH. Computational discovery of transcriptional regulatory modules in fungal ribosome biogenesis genes reveals novel sequence and function patterns. PLoS One 2013; 8:e59851. [PMID: 23555806 PMCID: PMC3612091 DOI: 10.1371/journal.pone.0059851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2013] [Accepted: 02/20/2013] [Indexed: 11/24/2022] Open
Abstract
Genes involved in ribosome biogenesis and assembly (RBA) are responsible for ribosome formation. In Saccharomyces cerevisiae, their transcription is regulated by two dissimilar DNA motifs. We were interested in analyzing conservation and divergence of RBA transcription regulation machinery throughout fungal evolution. We have identified orthologs of S. cerevisiae RBA genes in 39 species across fungal phylogeny and searched upstream regions of these gene sets for DNA sequences significantly similar to S. cerevisiae RBA regulatory motifs. In addition to confirming known motif arrangements comprising two different motifs in a set of S. cerevisiae close relatives or two instances of the same motif (that we refer to as modules), we have also discovered novel modules in a group of fungi closely related to Neurospora crassa. Despite a single nucleotide difference between consensus sequences of RBA motifs, modules associated with S, cerevisiae group and N. crassa group displayed consistently different characteristics with respect to preferred module organization and several other module properties. For a given species, we have found a correlation between the configuration of the RBA module and significant enrichment in a set of specific Gene Ontology biological processes. We have identified several likely new candidates for a role in ribosome biogenesis in S. cerevisiae based on the combined evidence of RBA module presence in the upstream regions, functional annotation information and microarray expression profiles. We believe that this approach will be useful in terms of generating hypotheses about functional roles of genes for which only fragmentary data from a single source are available.
Collapse
Affiliation(s)
- Viktor Martyanov
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, United States of America
| | - Robert H. Gross
- Department of Biological Sciences, Dartmouth College, Hanover, New Hampshire, United States of America
- * E-mail:
| |
Collapse
|
25
|
The family structure of the Mucorales: a synoptic revision based on comprehensive multigene-genealogies. Persoonia - Molecular Phylogeny and Evolution of Fungi 2013; 30:57-76. [PMID: 24027347 PMCID: PMC3734967 DOI: 10.3767/003158513x666259] [Citation(s) in RCA: 113] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2012] [Accepted: 01/01/2013] [Indexed: 02/01/2023]
Abstract
The Mucorales (Mucoromycotina) are one of the most ancient groups of fungi comprising ubiquitous, mostly saprotrophic organisms. The first comprehensive molecular studies 11 yr ago revealed the traditional classification scheme, mainly based on morphology, as highly artificial. Since then only single clades have been investigated in detail but a robust classification of the higher levels based on DNA data has not been published yet. Therefore we provide a classification based on a phylogenetic analysis of four molecular markers including the large and the small subunit of the ribosomal DNA, the partial actin gene and the partial gene for the translation elongation factor 1-alpha. The dataset comprises 201 isolates in 103 species and represents about one half of the currently accepted species in this order. Previous family concepts are reviewed and the family structure inferred from the multilocus phylogeny is introduced and discussed. Main differences between the current classification and preceding concepts affects the existing families Lichtheimiaceae and Cunninghamellaceae, as well as the genera Backusella and Lentamyces which recently obtained the status of families along with the Rhizopodaceae comprising Rhizopus, Sporodiniella and Syzygites. Compensatory base change analyses in the Lichtheimiaceae confirmed the lower level classification of Lichtheimia and Rhizomucor while genera such as Circinella or Syncephalastrum completely lacked compensatory base changes.
Collapse
|
26
|
Altenhoff AM, Gil M, Gonnet GH, Dessimoz C. Inferring hierarchical orthologous groups from orthologous gene pairs. PLoS One 2013; 8:e53786. [PMID: 23342000 PMCID: PMC3544860 DOI: 10.1371/journal.pone.0053786] [Citation(s) in RCA: 108] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 12/05/2012] [Indexed: 11/19/2022] Open
Abstract
Hierarchical orthologous groups are defined as sets of genes that have descended from a single common ancestor within a taxonomic range of interest. Identifying such groups is useful in a wide range of contexts, including inference of gene function, study of gene evolution dynamics and comparative genomics. Hierarchical orthologous groups can be derived from reconciled gene/species trees but, this being a computationally costly procedure, many phylogenomic databases work on the basis of pairwise gene comparisons instead (“graph-based” approach). To our knowledge, there is only one published algorithm for graph-based hierarchical group inference, but both its theoretical justification and performance in practice are as of yet largely uncharacterised. We establish a formal correspondence between the orthology graph and hierarchical orthologous groups. Based on that, we devise GETHOGs (“Graph-based Efficient Technique for Hierarchical Orthologous Groups”), a novel algorithm to infer hierarchical groups directly from the orthology graph, thus without needing gene tree inference nor gene/species tree reconciliation. GETHOGs is shown to correctly reconstruct hierarchical orthologous groups when applied to perfect input, and several extensions with stringency parameters are provided to deal with imperfect input data. We demonstrate its competitiveness using both simulated and empirical data. GETHOGs is implemented as a part of the freely-available OMA standalone package (http://omabrowser.org/standalone). Furthermore, hierarchical groups inferred by GETHOGs (“OMA HOGs”) on >1,000 genomes can be interactively queried via the OMA browser (http://omabrowser.org).
Collapse
Affiliation(s)
- Adrian M. Altenhoff
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Manuel Gil
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Gaston H. Gonnet
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Christophe Dessimoz
- Department of Computer Science, ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
- EMBL-European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom
- * E-mail:
| |
Collapse
|
27
|
Yu C, Desai V, Cheng L, Reifman J. QuartetS-DB: a large-scale orthology database for prokaryotes and eukaryotes inferred by evolutionary evidence. BMC Bioinformatics 2012; 13:143. [PMID: 22726705 PMCID: PMC3434046 DOI: 10.1186/1471-2105-13-143] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 05/28/2012] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The concept of orthology is key to decoding evolutionary relationships among genes across different species using comparative genomics. QuartetS is a recently reported algorithm for large-scale orthology detection. Based on the well-established evolutionary principle that gene duplication events discriminate paralogous from orthologous genes, QuartetS has been shown to improve orthology detection accuracy while maintaining computational efficiency. DESCRIPTION QuartetS-DB is a new orthology database constructed using the QuartetS algorithm. The database provides orthology predictions among 1621 complete genomes (1365 bacterial, 92 archaeal, and 164 eukaryotic), covering more than seven million proteins and four million pairwise orthologs. It is a major source of orthologous groups, containing more than 300,000 groups of orthologous proteins and 236,000 corresponding gene trees. The database also provides over 500,000 groups of inparalogs. In addition to its size, a distinguishing feature of QuartetS-DB is the ability to allow users to select a cutoff value that modulates the balance between prediction accuracy and coverage of the retrieved pairwise orthologs. The database is accessible at https://applications.bioanalysis.org/quartetsdb. CONCLUSIONS QuartetS-DB is one of the largest orthology resources available to date. Because its orthology predictions are underpinned by evolutionary evidence obtained from sequenced genomes, we expect its accuracy to continue to increase in future releases as the genomes of additional species are sequenced.
Collapse
Affiliation(s)
- Chenggang Yu
- United States Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, US Army Medical Research and Materiel Command, Fort Detrick, MD 21702, USA
| | | | | | | |
Collapse
|
28
|
Wilkins AD, Bachman BJ, Erdin S, Lichtarge O. The use of evolutionary patterns in protein annotation. Curr Opin Struct Biol 2012; 22:316-25. [PMID: 22633559 DOI: 10.1016/j.sbi.2012.05.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Accepted: 05/01/2012] [Indexed: 01/13/2023]
Abstract
With genomic data skyrocketing, their biological interpretation remains a serious challenge. Diverse computational methods address this problem by pointing to the existence of recurrent patterns among sequence, structure, and function. These patterns emerge naturally from evolutionary variation, natural selection, and divergence--the defining features of biological systems--and they identify molecular events and shapes that underlie specificity of function and allosteric communication. Here we review these methods, and the patterns they identify in case studies and in proteome-wide applications, to infer and rationally redesign function.
Collapse
Affiliation(s)
- Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | | | |
Collapse
|
29
|
Song G, Riemer C, Dickins B, Kim HL, Zhang L, Zhang Y, Hsu CH, Hardison RC, Nisc Comparative Sequencing Program, Green ED, Miller W. Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. Genome Biol Evol 2012; 4:586-601. [PMID: 22454131 PMCID: PMC3342878 DOI: 10.1093/gbe/evs032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/19/2012] [Indexed: 12/13/2022] Open
Abstract
Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller_lab.
Collapse
Affiliation(s)
- Giltae Song
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, PA, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Szklarczyk R, Wanschers BF, Cuypers TD, Esseling JJ, Riemersma M, van den Brand MA, Gloerich J, Lasonder E, van den Heuvel LP, Nijtmans LG, Huynen MA. Iterative orthology prediction uncovers new mitochondrial proteins and identifies C12orf62 as the human ortholog of COX14, a protein involved in the assembly of cytochrome c oxidase. Genome Biol 2012; 13:R12. [PMID: 22356826 PMCID: PMC3334569 DOI: 10.1186/gb-2012-13-2-r12] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Revised: 02/03/2012] [Accepted: 02/22/2012] [Indexed: 11/10/2022] Open
Abstract
Background Orthology is a central tenet of comparative genomics and ortholog identification is instrumental to protein function prediction. Major advances have been made to determine orthology relations among a set of homologous proteins. However, they depend on the comparison of individual sequences and do not take into account divergent orthologs. Results We have developed an iterative orthology prediction method, Ortho-Profile, that uses reciprocal best hits at the level of sequence profiles to infer orthology. It increases ortholog detection by 20% compared to sequence-to-sequence comparisons. Ortho-Profile predicts 598 human orthologs of mitochondrial proteins from Saccharomyces cerevisiae and Schizosaccharomyces pombe with 94% accuracy. Of these, 181 were not known to localize to mitochondria in mammals. Among the predictions of the Ortho-Profile method are 11 human cytochrome c oxidase (COX) assembly proteins that are implicated in mitochondrial function and disease. Their co-expression patterns, experimentally verified subcellular localization, and co-purification with human COX-associated proteins support these predictions. For the human gene C12orf62, the ortholog of S. cerevisiae COX14, we specifically confirm its role in negative regulation of the translation of cytochrome c oxidase. Conclusions Divergent homologs can often only be detected by comparing sequence profiles and profile-based hidden Markov models. The Ortho-Profile method takes advantage of these techniques in the quest for orthologs.
Collapse
Affiliation(s)
- Radek Szklarczyk
- Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, Nijmegen, 6500 HB, The Netherlands.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Powell S, Szklarczyk D, Trachana K, Roth A, Kuhn M, Muller J, Arnold R, Rattei T, Letunic I, Doerks T, Jensen LJ, von Mering C, Bork P. eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res 2012; 40:D284-9. [PMID: 22096231 PMCID: PMC3245133 DOI: 10.1093/nar/gkr1060] [Citation(s) in RCA: 388] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 10/26/2011] [Accepted: 10/26/2011] [Indexed: 11/28/2022] Open
Abstract
Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721,801 orthologous groups, encompassing a total of 4,396,591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101,208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450,904 orthologous groups (62.5%).
Collapse
Affiliation(s)
- Sean Powell
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Damian Szklarczyk
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Kalliopi Trachana
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Alexander Roth
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Michael Kuhn
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Jean Muller
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Roland Arnold
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Thomas Rattei
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Ivica Letunic
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Tobias Doerks
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Lars J. Jensen
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Christian von Mering
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| | - Peer Bork
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Institute of Genetics and Molecular and Cellular Biology, CNRS, INSERM, University of Strasbourg, Genetic Diagnostics Laboratory, CHU Strasbourg Nouvel Hôpital Civil, Strasbourg, France, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M55 3E1, Canada, University of Vienna, Department of Computational Systems Biology, Althanstrasse 14, 1090 Vienna, Austria and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany
| |
Collapse
|
32
|
Puigbò P, Wolf YI, Koonin EV. Genome-wide comparative analysis of phylogenetic trees: the prokaryotic forest of life. Methods Mol Biol 2012; 856:53-79. [PMID: 22399455 PMCID: PMC3842619 DOI: 10.1007/978-1-61779-585-5_3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article, we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the application of these methods to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a "species tree."
Collapse
Affiliation(s)
- Pere Puigbò
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health. Bethesda, Maryland 20894. USA
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health. Bethesda, Maryland 20894. USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health. Bethesda, Maryland 20894. USA
| |
Collapse
|
33
|
Tetushkin EY. Genetic aspects of genealogy. RUSS J GENET+ 2011. [DOI: 10.1134/s1022795411110160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
34
|
Global analysis of serine-threonine protein kinase genes in Neurospora crassa. EUKARYOTIC CELL 2011; 10:1553-64. [PMID: 21965514 DOI: 10.1128/ec.05140-11] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Serine/threonine (S/T) protein kinases are crucial components of diverse signaling pathways in eukaryotes, including the model filamentous fungus Neurospora crassa. In order to assess the importance of S/T kinases to Neurospora biology, we embarked on a global analysis of 86 S/T kinase genes in Neurospora. We were able to isolate viable mutants for 77 of the 86 kinase genes. Of these, 57% exhibited at least one growth or developmental phenotype, with a relatively large fraction (40%) possessing a defect in more than one trait. S/T kinase knockouts were subjected to chemical screening using a panel of eight chemical treatments, with 25 mutants exhibiting sensitivity or resistance to at least one chemical. This brought the total percentage of S/T mutants with phenotypes in our study to 71%. Mutants lacking apg-1, an S/T kinase required for autophagy in other organisms, possessed the greatest number of phenotypes, with defects in asexual and sexual growth and development and in altered sensitivity to five chemical treatments. We showed that NCU02245/stk-19 is required for chemotropic interactions between female and male cells during mating. Finally, we demonstrated allelism between the S/T kinase gene NCU00406 and velvet (vel), encoding a p21-activated protein kinase (PAK) gene important for asexual and sexual growth and development in Neurospora.
Collapse
|
35
|
Tzika AC, Helaers R, Schramm G, Milinkovitch MC. Reptilian-transcriptome v1.0, a glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic position of turtles. EvoDevo 2011; 2:19. [PMID: 21943375 PMCID: PMC3192992 DOI: 10.1186/2041-9139-2-19] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2011] [Accepted: 09/26/2011] [Indexed: 12/05/2022] Open
Abstract
Background Reptiles are largely under-represented in comparative genomics despite the fact that they are substantially more diverse in many respects than mammals. Given the high divergence of reptiles from classical model species, next-generation sequencing of their transcriptomes is an approach of choice for gene identification and annotation. Results Here, we use 454 technology to sequence the brain transcriptome of four divergent reptilian and one reference avian species: the Nile crocodile, the corn snake, the bearded dragon, the red-eared turtle, and the chicken. Using an in-house pipeline for recursive similarity searches of >3,000,000 reads against multiple databases from 7 reference vertebrates, we compile a reptilian comparative transcriptomics dataset, with homology assignment for 20,000 to 31,000 transcripts per species and a cumulated non-redundant sequence length of 248.6 Mbases. Our approach identifies the majority (87%) of chicken brain transcripts and about 50% of de novo assembled reptilian transcripts. In addition to 57,502 microsatellite loci, we identify thousands of SNP and indel polymorphisms for population genetic and linkage analyses. We also build very large multiple alignments for Sauropsida and mammals (two million residues per species) and perform extensive phylogenetic analyses suggesting that turtles are not basal living reptiles but are rather associated with Archosaurians, hence, potentially answering a long-standing question in the phylogeny of Amniotes. Conclusions The reptilian transcriptome (freely available at http://www.reptilian-transcriptomes.org) should prove a useful new resource as reptiles are becoming important new models for comparative genomics, ecology, and evolutionary developmental genetics.
Collapse
Affiliation(s)
- Athanasia C Tzika
- Laboratory of Artificial & Natural Evolution (LANE), Dept, of Genetics & Evolution, University of Geneva, Sciences III, 30, Quai Ernest-Ansermet, 1211 Genève 4, Switzerland.
| | | | | | | |
Collapse
|
36
|
Hu Y, Flockhart I, Vinayagam A, Bergwitz C, Berger B, Perrimon N, Mohr SE. An integrative approach to ortholog prediction for disease-focused and other functional studies. BMC Bioinformatics 2011; 12:357. [PMID: 21880147 PMCID: PMC3179972 DOI: 10.1186/1471-2105-12-357] [Citation(s) in RCA: 517] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2011] [Accepted: 08/31/2011] [Indexed: 12/12/2022] Open
Abstract
Background Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. Results We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). Conclusions DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.
Collapse
Affiliation(s)
- Yanhui Hu
- Drosophila RNAi Screening Center, Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA
| | | | | | | | | | | | | |
Collapse
|
37
|
Abstract
The TOR (target of rapamycin) kinase is present in nearly all eukaryotic organisms and regulates a wealth of biological processes collectively contributing to cell growth. The genome of the model plant Arabidopsis contains a single TOR gene and two RAPTOR (regulatory associated protein of TOR)/KOG1 (Kontroller of growth 1) and GβL/LST8 (G-protein β-subunit-like/lethal with Sec thirteen 8) genes but, in contrast with other organisms, plants appear to be resistant to rapamycin. Disruption of the RAPTOR1 and TOR genes in Arabidopsis results in an early arrest of embryo development. Plants that overexpress the TOR mRNA accumulate more leaf and root biomass, produce more seeds and are more resistant to stress. Conversely, the down-regulation of TOR by constitutive or inducible RNAi (RNA interference) leads to a reduced organ growth, to an early senescence and to severe transcriptomic and metabolic perturbations, including accumulation of sugars and amino acids. It thus seems that plant growth is correlated to the level of TOR expression. We have also investigated the effect of reduced TOR expression on tissue organization and cell division. We suggest that, like in other eukaryotes, the plant TOR kinase could be one of the main contributors to the link between environmental cues and growth processes.
Collapse
|
38
|
Sjölander K, Datta RS, Shen Y, Shoffner GM. Ortholog identification in the presence of domain architecture rearrangement. Brief Bioinform 2011; 12:413-22. [PMID: 21712343 PMCID: PMC3178056 DOI: 10.1093/bib/bbr036] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology. While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area.
Collapse
Affiliation(s)
- Kimmen Sjölander
- 308C Stanley Hall #1762, Department of Bioengineering, University of California, Berkeley, CA 94720, USA.
| | | | | | | |
Collapse
|
39
|
Lipinski KA, Puchta O, Surendranath V, Kudla M, Golik P. Revisiting the yeast PPR proteins--application of an Iterative Hidden Markov Model algorithm reveals new members of the rapidly evolving family. Mol Biol Evol 2011; 28:2935-48. [PMID: 21546354 DOI: 10.1093/molbev/msr120] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Pentatricopeptide repeat (PPR) proteins are the largest known RNA-binding protein family, and are found in all eukaryotes, being particularly abundant in higher plants. PPR proteins localize mostly to mitochondria and chloroplasts, and many were shown to modulate organellar genome expression on the posttranscriptional level. Although the genomes of land plants encode hundreds of PPR proteins, only a few have been identified in Fungi and Metazoa. As the current PPR motif profiles are built mainly on the basis of the predominant plant sequences, they are unlikely to be optimal for detecting fungal and animal members of the family, and many putative PPR proteins in these genomes may remain undetected. In order to verify this hypothesis, we designed a hidden Markov model-based bioinformatic tool called Supervised Clustering-based Iterative Phylogenetic Hidden Markov Model algorithm for the Evaluation of tandem Repeat motif families (SCIPHER) using sequence data from orthologous clusters from available yeast genomes. This approach allowed us to assign 12 new proteins in Saccharomyces cerevisiae to the PPR family. Similarly, in other yeast species, we obtained a 5-fold increase in the detection of PPR motifs, compared with the previous tools. All the newly identified S. cerevisiae PPR proteins localize in the mitochondrion and are a part of the RNA processing interaction network. Furthermore, the yeast PPR proteins seem to undergo an accelerated divergent evolution. Analysis of single and double amino acid substitutions in the Dmr1 protein of S. cerevisiae suggests that cooperative interactions between motifs and pseudoreversion could be the force driving this rapid evolution.
Collapse
Affiliation(s)
- Kamil A Lipinski
- Institute of Genetics and Biotechnology, University of Warsaw, Warsaw, Poland
| | | | | | | | | |
Collapse
|
40
|
Vellozo AF, Véron AS, Baa-Puyoulet P, Huerta-Cepas J, Cottret L, Febvay G, Calevro F, Rahbé Y, Douglas AE, Gabaldón T, Sagot MF, Charles H, Colella S. CycADS: an annotation database system to ease the development and update of BioCyc databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011; 2011:bar008. [PMID: 21474551 PMCID: PMC3072769 DOI: 10.1093/database/bar008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL:http://www.cycadsys.org
Collapse
Affiliation(s)
- Augusto F Vellozo
- UMR203 BF2I, Biologie Fonctionnelle Insectes et Interactions, INRA, INSA-Lyon, Université de Lyon, 20 av. A. Einstein, F-69621 Villeurbanne, France.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Boulais J, Trost M, Landry CR, Dieckmann R, Levy ED, Soldati T, Michnick SW, Thibault P, Desjardins M. Molecular characterization of the evolution of phagosomes. Mol Syst Biol 2011; 6:423. [PMID: 20959821 PMCID: PMC2990642 DOI: 10.1038/msb.2010.80] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Accepted: 09/15/2010] [Indexed: 11/23/2022] Open
Abstract
First large-scale comparative proteomics/phosphoproteomics study characterizing some of the key steps that contributed to the remodeling of phagosomes that occurred during evolution. Comparison of profiling analyses of isolated phagosomes from three distant organisms (Dictyostelium, Drosophila, and mouse) revealed a protein core that defines a potential ‘ancient' phagosome and a set of 50 proteins that emerged while adaptive immunity was already well established. Gene duplication events of mouse phagosome paralogs occurred mostly in Bilateria and Euteleostomi, coinciding with the emergence of innate and adaptive immunity, and thus, provided the functional innovations needed for the establishment of these two crucial evolutionary steps of the immune system. Phosphoproteomics of isolated phagosomes from the same three distant species indicate that the phagosome phosphoproteome has been extensively modified during evolution. Still, some phosphosites have been maintained for >1.2 billion years, and thus, highlight their particular significance in the regulation of key phagosomal functions.
Phagocytosis is the process by which multiple cell types internalize large particulate material from the external milieu. The functional properties of phagosomes are acquired through a complex maturation process, referred to as phagolysosome biogenesis. This pathway involves a series of rapid interactions with organelles of the endocytic apparatus, enabling the gradual transformation of newly formed phagosomes into phagolysosomes in which proteolytic degradation occurs. The degradative environment encountered in the phagosome lumen has enabled the use of phagocytosis as a predation mechanism for feeding (phagotrophy) in amoeba, whereas multicellular organisms utilize this process as a defense mechanism to kill microbes and, in jawed vertebrates (fish), initiate a sustained immune response. High-throughput proteomics profiling of isolated phagosomes has been tremendously helpful for the molecular comprehension of this organelle. This approach is achieved by feeding low buoyancy latex beads to phagocytic cells, enabling the subsequent isolation of latex bead-containing phagosomes, away from all the other cell organelles, by a single-isopicnic centrifugation in sucrose gradient. In order to characterize some of the key steps that contributed to the remodeling of phagosomes during evolution, we isolated this organelle from three distant organisms: the amoeba Dictyostelium discoideum, the fruit fly Drosophila melanogaster, and mouse (Mus musculus) that use phagocytosis for different purposes, and performed detailed proteomics and phosphoproteomics analyses with unparallel protein coverage for this organelle (two- to four-fold enhancements in identified proteins). In order to establish the origin of the mouse phagosome proteome, we performed comparative analyses among 39 taxa including plants/algea, unicellular organisms, fungi, and more complex animal multicellular organisms. These genomic comparisons indicated that a large proportion of the mouse phagosome proteome is of ancient origin (73.1% of the proteome is conserved in eukaryotic organisms) (Figure 2A). This stresses the fact that phagocytosis is a very ancient process, as shown by its possible involvement in the emergence of eukaryotic cells (eukaryogenesis). Indeed, we identified close to 300 phagosome mouse proteins also present on Drosophila and Dictyostelium phagosomes, defining a potential ‘ancient' core of proteins from which the immune functions of phagosomes likely evolved. Around 16.7% of the mouse phagosome proteins appeared in organisms that use phagocytosis for innate immunity (Bilateria to Chordata), whereas 10.2% appeared in Euteleostomi or Tetrapoda where phagosomes have an important function in linking the killing of microorganisms with the development of a specific sustained immune response following antigen recognition. The phagosome is made of molecules taken from a variety of sources within the cell, including the cytoplasm, the cytoskeleton and membrane organelles. Despite the evolution and diversification of these various cellular systems, the mammalian phagosome proteome is made preferentially of ancient proteins (Figure 2B). Comparison of functional annotation during evolution highlighted the emergence of specific phagosomal functions at various steps during evolution (Figure 2C). Some of these proteins and their point of origin during evolution are highlighted in Figure 2D. Strikingly, we identified in Tetrapods a set of 50 proteins that arose while adaptive immunity was already well established in teleosts (fish), indicating that the phagocytic system is still evolving. Our study highlights the fact that the functional properties of phagosomes emerged by the remodeling of ancient molecules, the addition of novel components, and the duplication of existing proteins (paralogs) leading to the formation of molecular machines of mixed origin. Gene duplication is a process that contributed continuously to the complexification of the mouse proteome during evolution. In sharp contrast, paralog analysis indicated that the phagosome proteome was mainly reorganized through two periods of gene duplication, in Bilateria and Euteleostomi, coinciding with the emergence of adaptive immunity (in jawed fish), and innate immunity (at the split between Metazoa and Bilateria). These results strongly suggest that selective constraints may have favored the maintenance of phagosome paralogs to ensure the establishment of novel functions associated with this organelle at these two crucial evolutionary steps of the immune system. The emergence of genes associated to the MHC locus in mammals that appeared originally in the genome of jawed fishes, contributed to the development of complex molecular mechanisms linking innate (our immune system that defends the host from infection in a non-specific manner) and adaptive immunity (the part of the immune system triggered specifically after antigen recognition). Several of the genes of this locus encode proteins known to have important functions in antigen presentation, such as subunits of the immunoproteasome (LMP2 and LMP7), MHC class I and class II molecules, as well as tapasin and the transporter associated with antigen processing (TAP1 and TAP2), involved in the transport and loading of peptides on MHC class I molecules (Figure 6). In addition to their ability to present peptides on MHC class II molecules, phagosomes of vertebrates have been shown to be competent for the presentation of exogenous peptides on MHC class I molecules, a process referred to as cross-presentation. From a functional point of view, the involvement of phagosomes in antigen cross-presentation is the outcome of the successful integration of a wide range of multimolecular components that emerged throughout evolution (Figure 6). The trimming of exogenous proteins into small peptides that can be loaded on MHC class I molecules is inherited from the phagotrophic properties of unicellular organisms, where internalized bacteria are degraded into basic molecules and used as a source of nutrients. Ancient processes have therefore been co-opted (the use of an existing biological structure or feature for a new function) for new functionalities. A summarizing model of the various steps that enabled phagosome antigen presentation is presented in Figure 6. This model highlights the fact that although antigen presentation is unique to evolutionary recent phagosomes (starting in jawed fishes about 450 million years ago), it uses and integrates molecular machines composed of proteins that emerged throughout evolution. In summary, we present here the first large-scale comparative proteomics/phosphoproteomics study characterizing some of the key evolutionary steps that contributed to the remodeling of phagosomes during evolution. Functional properties of this organelle emerged by the remodeling of ancient molecules, the addition of novel components, the extensive adaption of protein phosphorylation sites and the duplication of existing proteins leading to the formation of molecular machines of mixed origin. Amoeba use phagocytosis to internalize bacteria as a source of nutrients, whereas multicellular organisms utilize this process as a defense mechanism to kill microbes and, in vertebrates, initiate a sustained immune response. By using a large-scale approach to identify and compare the proteome and phosphoproteome of phagosomes isolated from distant organisms, and by comparative analysis over 39 taxa, we identified an ‘ancient' core of phagosomal proteins around which the immune functions of this organelle have likely organized. Our data indicate that a larger proportion of the phagosome proteome, compared with the whole cell proteome, has been acquired through gene duplication at a period coinciding with the emergence of innate and adaptive immunity. Our study also characterizes in detail the acquisition of novel proteins and the significant remodeling of the phagosome phosphoproteome that contributed to modify the core constituents of this organelle in evolution. Our work thus provides the first thorough analysis of the changes that enabled the transformation of the phagosome from a phagotrophic compartment into an organelle fully competent for antigen presentation.
Collapse
Affiliation(s)
- Jonathan Boulais
- Département de Pathologie et Biologie Cellulaire, Université de Montréal, Montréal, Québec, Canada
| | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Pryszcz LP, Huerta-Cepas J, Gabaldón T. MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. Nucleic Acids Res 2010; 39:e32. [PMID: 21149260 PMCID: PMC3061081 DOI: 10.1093/nar/gkq953] [Citation(s) in RCA: 102] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Reliable prediction of orthology is central to comparative genomics. Approaches based on phylogenetic analyses closely resemble the original definition of orthology and paralogy and are known to be highly accurate. However, the large computational cost associated to these analyses is a limiting factor that often prevents its use at genomic scales. Recently, several projects have addressed the reconstruction of large collections of high-quality phylogenetic trees from which orthology and paralogy relationships can be inferred. This provides us with the opportunity to infer the evolutionary relationships of genes from multiple, independent, phylogenetic trees. Using such strategy, we combine phylogenetic information derived from different databases, to predict orthology and paralogy relationships for 4.1 million proteins in 829 fully sequenced genomes. We show that the number of independent sources from which a prediction is made, as well as the level of consistency across predictions, can be used as reliable confidence scores. A webserver has been developed to easily access these data (http://orthology.phylomedb.org), which provides users with a global repository of phylogeny-based orthology and paralogy predictions.
Collapse
Affiliation(s)
- Leszek P Pryszcz
- Bioinformatics and Genomics Programme, Centre de Regulació Genòmica (CRG), Universitat Pompeu Fabra, Dr. Aiguader, 88. 08003, Barcelona, Spain
| | | | | |
Collapse
|
43
|
Altenhoff AM, Schneider A, Gonnet GH, Dessimoz C. OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Res 2010; 39:D289-94. [PMID: 21113020 PMCID: PMC3013747 DOI: 10.1093/nar/gkq1238] [Citation(s) in RCA: 167] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
OMA (Orthologous MAtrix) is a database that identifies orthologs among publicly available, complete genomes. Initiated in 2004, the project is at its 11th release. It now includes 1000 genomes, making it one of the largest resources of its kind. Here, we describe recent developments in terms of species covered; the algorithmic pipeline—in particular regarding the treatment of alternative splicing, and new features of the web (OMA Browser) and programming interface (SOAP API). In the second part, we review the various representations provided by OMA and their typical applications. The database is publicly accessible at http://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M. Altenhoff
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland, Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland and University of Edinburgh, Institute of Evolutionary Biology, West Mains Rd, Edinburgh, EH9 3JT, UK
| | - Adrian Schneider
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland, Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland and University of Edinburgh, Institute of Evolutionary Biology, West Mains Rd, Edinburgh, EH9 3JT, UK
| | - Gaston H. Gonnet
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland, Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland and University of Edinburgh, Institute of Evolutionary Biology, West Mains Rd, Edinburgh, EH9 3JT, UK
| | - Christophe Dessimoz
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland, Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zurich, Switzerland and University of Edinburgh, Institute of Evolutionary Biology, West Mains Rd, Edinburgh, EH9 3JT, UK
- *To whom correspondence should be addressed. Tel: +41 44 6327472; Fax: +41 44 6321374;
| |
Collapse
|
44
|
Huerta-Cepas J, Capella-Gutierrez S, Pryszcz LP, Denisov I, Kormes D, Marcet-Houben M, Gabaldón T. PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions. Nucleic Acids Res 2010; 39:D556-60. [PMID: 21075798 PMCID: PMC3013701 DOI: 10.1093/nar/gkq1109] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The growing availability of complete genomic sequences from diverse species has brought about the need to scale up phylogenomic analyses, including the reconstruction of large collections of phylogenetic trees. Here, we present the third version of PhylomeDB (http://phylomeDB.org), a public database for genome-wide collections of gene phylogenies (phylomes). Currently, PhylomeDB is the largest phylogenetic repository and hosts 17 phylomes, comprising 416,093 trees and 165,840 alignments. It is also a major source for phylogeny-based orthology and paralogy predictions, covering about 5 million proteins in 717 fully-sequenced genomes. For each protein-coding gene in a seed genome, the database provides original and processed alignments, phylogenetic trees derived from various methods and phylogeny-based predictions of orthology and paralogy relationships. The new version of phylomeDB has been extended with novel data access and visualization features, including the possibility of programmatic access. Available seed species include model organisms such as human, yeast, Escherichia coli or Arabidopsis thaliana, but also alternative model species such as the human pathogen Candida albicans, or the pea aphid Acyrtosiphon pisum. Finally, PhylomeDB is currently being used by several genome sequencing projects that couple the genome annotation process with the reconstruction of the corresponding phylome, a strategy that provides relevant evolutionary insights.
Collapse
Affiliation(s)
- Jaime Huerta-Cepas
- Bioinformatics and Genomics Programme, Centre de Regulació Genòmica, 08003 Barcelona, Spain
| | | | | | | | | | | | | |
Collapse
|
45
|
Huerta-Cepas J, Gabaldón T. Assigning duplication events to relative temporal scales in genome-wide studies. Bioinformatics 2010; 27:38-45. [PMID: 21075746 DOI: 10.1093/bioinformatics/btq609] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION In genome-wide analyses, the relative age of gene duplications is often estimated by measuring the rate of synonymous substitutions (dS) between paralogous sequences. On the other hand, recent studies have shown the feasibility of inferring, at genomic scales, the relative age of duplication events from the topology of gene family trees. This represents a promising alternative for large surveys requiring an automatic methodology to establish a timeline of duplication events and that are usually limited to the use of dS, which presents known limitations such as a fast saturation of the signal. However, both measures have never been compared in a common framework. RESULTS Topology-based placement of duplications on a relative time scale corresponding to periods between speciation events were found to be highly consistent, providing the same placement for 67-84% of a reliable set of gene pairs duplicated in a single event. For recent evolutionary periods, dS and topological measures showed a strong correlation. We conclude that the topology-based approach is more appropriate for assigning duplications to temporal scales when analyses need to include ancient events, and that the study of recent duplications may benefit from a combination of dS and topology information.
Collapse
|
46
|
Waterhouse RM, Zdobnov EM, Tegenfeldt F, Li J, Kriventseva EV. OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011. Nucleic Acids Res 2010; 39:D283-8. [PMID: 20972218 PMCID: PMC3013786 DOI: 10.1093/nar/gkq930] [Citation(s) in RCA: 109] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The concept of homology drives speculation on a gene's function in any given species when its biological roles in other species are characterized. With reference to a specific species radiation homologous relations define orthologs, i.e. descendants from a single gene of the ancestor. The large-scale delineation of gene genealogies is a challenging task, and the numerous approaches to the problem reflect the importance of the concept of orthology as a cornerstone for comparative studies. Here, we present the updated OrthoDB catalog of eukaryotic orthologs delineated at each radiation of the species phylogeny in an explicitly hierarchical manner of over 100 species of vertebrates, arthropods and fungi (including the metazoa level). New database features include functional annotations, and quantification of evolutionary divergence and relations among orthologous groups. The interface features extended phyletic profile querying and enhanced text-based searches. The ever-increasing sampling of sequenced eukaryotic genomes brings a clearer account of the majority of gene genealogies that will facilitate informed hypotheses of gene function in newly sequenced genomes. Furthermore, uniform analysis across lineages as different as vertebrates, arthropods and fungi with divergence levels varying from several to hundreds of millions of years will provide essential data for uncovering and quantifying long-term trends of gene evolution. OrthoDB is freely accessible from http://cegg.unige.ch/orthodb.
Collapse
Affiliation(s)
- Robert M Waterhouse
- Department of Genetic Medicine and Development, University of Geneva Medical School, Swiss Institute of Bioinformatics, rue Michel-Servet 1, 1211 Geneva, Switzerland
| | | | | | | | | |
Collapse
|
47
|
Abstract
Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species.
Collapse
Affiliation(s)
- Christian Frech
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Nansheng Chen
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada
- * E-mail:
| |
Collapse
|
48
|
Criscuolo A, Gribaldo S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 2010; 10:210. [PMID: 20626897 PMCID: PMC3017758 DOI: 10.1186/1471-2148-10-210] [Citation(s) in RCA: 911] [Impact Index Per Article: 65.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Accepted: 07/13/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-)automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step. RESULTS Here is presented a new software, named BMGE (Block Mapping and Gathering with Entropy), that is designed to select regions in a multiple sequence alignment that are suited for phylogenetic inference. For each character, BMGE computes a score closely related to an entropy value. Calculation of these entropy-like scores is weighted with BLOSUM or PAM similarity matrices in order to distinguish among biologically expected and unexpected variability for each aligned character. Sets of contiguous characters with a score above a given threshold are considered as not suited for phylogenetic inference and then removed. Simulation analyses show that the character trimming performed by BMGE produces datasets leading to accurate trees, especially with alignments including distantly-related sequences. BMGE also implements trimming and recoding methods aimed at minimizing phylogeny reconstruction artefacts due to compositional heterogeneity. CONCLUSIONS BMGE is able to perform biologically relevant trimming on a multiple alignment of DNA, codon or amino acid sequences. Java source code and executable are freely available at ftp://ftp.pasteur.fr/pub/GenSoft/projects/BMGE/.
Collapse
Affiliation(s)
- Alexis Criscuolo
- Institut Pasteur, Unité de Biologie Moléculaire du Gène chez les Extrêmophiles, Département de Microbiologie, 25 rue du Dr Roux, 75015 Paris, France
| | - Simonetta Gribaldo
- Institut Pasteur, Unité de Biologie Moléculaire du Gène chez les Extrêmophiles, Département de Microbiologie, 25 rue du Dr Roux, 75015 Paris, France
| |
Collapse
|
49
|
DeathBase: a database on structure, evolution and function of proteins involved in apoptosis and other forms of cell death. Cell Death Differ 2010; 17:735-6. [PMID: 20383157 DOI: 10.1038/cdd.2009.215] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
50
|
Thomas PD. GIGA: a simple, efficient algorithm for gene tree inference in the genomic age. BMC Bioinformatics 2010; 11:312. [PMID: 20534164 PMCID: PMC2905364 DOI: 10.1186/1471-2105-11-312] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2009] [Accepted: 06/09/2010] [Indexed: 11/10/2022] Open
Abstract
Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.
Collapse
Affiliation(s)
- Paul D Thomas
- Evolutionary Systems Biology Group, SRI International, Menlo Park, CA, USA.
| |
Collapse
|