1
|
Singleton M, Eisen M. Leveraging genomic redundancy to improve inference and alignment of orthologous proteins. G3 (BETHESDA, MD.) 2023; 13:jkad222. [PMID: 37770067 PMCID: PMC10700111 DOI: 10.1093/g3journal/jkad222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 09/11/2023] [Accepted: 09/19/2023] [Indexed: 10/03/2023]
Abstract
Identifying protein sequences with common ancestry is a core task in bioinformatics and evolutionary biology. However, methods for inferring and aligning such sequences in annotated genomes have not kept pace with the increasing scale and complexity of the available data. Thus, in this work, we implemented several improvements to the traditional methodology that more fully leverage the redundancy of closely related genomes and the organization of their annotations. Two highlights include the application of the more flexible k-clique percolation algorithm for identifying clusters of orthologous proteins and the development of a novel technique for removing poorly supported regions of alignments with a phylogenetic hidden Markov model (phylo-HMM). In making the latter, we wrote a fully documented Python package Homomorph that implements standard HMM algorithms and created a set of tutorials to promote its use by a wide audience. We applied the resulting pipeline to a set of 33 annotated Drosophila genomes, generating 22,813 orthologous groups and 8,566 high-quality alignments.
Collapse
Affiliation(s)
- Marc Singleton
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, CA 94720, USA
| | - Michael Eisen
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
2
|
Pir MS, Bilgin HI, Sayici A, Coşkun F, Torun FM, Zhao P, Kang Y, Cevik S, Kaplan O. ConVarT: a search engine for matching human genetic variants with variants from non-human species. Nucleic Acids Res 2022; 50:D1172-D1178. [PMID: 34718716 PMCID: PMC8728286 DOI: 10.1093/nar/gkab939] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/07/2021] [Accepted: 10/14/2021] [Indexed: 01/16/2023] Open
Abstract
The availability of genetic variants, together with phenotypic annotations from model organisms, facilitates comparing these variants with equivalent variants in humans. However, existing databases and search tools do not make it easy to scan for equivalent variants, namely 'matching variants' (MatchVars) between humans and other organisms. Therefore, we developed an integrated search engine called ConVarT (http://www.convart.org/) for matching variants between humans, mice, and Caenorhabditis elegans. ConVarT incorporates annotations (including phenotypic and pathogenic) into variants, and these previously unexploited phenotypic MatchVars from mice and C. elegans can give clues about the functional consequence of human genetic variants. Our analysis shows that many phenotypic variants in different genes from mice and C. elegans, so far, have no counterparts in humans, and thus, can be useful resources when evaluating a relationship between a new human mutation and a disease.
Collapse
Affiliation(s)
- Mustafa S Pir
- Rare Disease Laboratory, School of Life and Natural Sciences, Abdullah Gul University, Kayseri 38090, Turkey
| | - Halil I Bilgin
- Department of Computer Engineering, Abdullah Gul University, Kayseri 38090, Turkey
| | - Ahmet Sayici
- Department of Computer Engineering, Abdullah Gul University, Kayseri 38090, Turkey
| | - Fatih Coşkun
- Department of Computer Engineering, Abdullah Gul University, Kayseri 38090, Turkey
| | - Furkan M Torun
- Rare Disease Laboratory, School of Life and Natural Sciences, Abdullah Gul University, Kayseri 38090, Turkey
| | - Pei Zhao
- SunyBiotech Co., Ltd, Fuzhou 35000, China
| | | | - Sebiha Cevik
- Rare Disease Laboratory, School of Life and Natural Sciences, Abdullah Gul University, Kayseri 38090, Turkey
| | - Oktay I Kaplan
- Rare Disease Laboratory, School of Life and Natural Sciences, Abdullah Gul University, Kayseri 38090, Turkey
| |
Collapse
|
3
|
Nevers Y, Kress A, Defosset A, Ripp R, Linard B, Thompson JD, Poch O, Lecompte O. OrthoInspector 3.0: open portal for comparative genomics. Nucleic Acids Res 2020; 47:D411-D418. [PMID: 30380106 PMCID: PMC6323921 DOI: 10.1093/nar/gky1068] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 10/19/2018] [Indexed: 01/08/2023] Open
Abstract
OrthoInspector is one of the leading software suites for orthology relations inference. In this paper, we describe a major redesign of the OrthoInspector online resource along with a significant increase in the number of species: 4753 organisms are now covered across the three domains of life, making OrthoInspector the most exhaustive orthology resource to date in terms of covered species (excluding viruses). The new website integrates original data exploration and visualization tools in an ergonomic interface. Distributions of protein orthologs are represented by heatmaps summarizing their evolutionary histories, and proteins with similar profiles can be directly accessed. Two novel tools have been implemented for comparative genomics: a phylogenetic profile search that can be used to find proteins with a specific presence-absence profile and investigate their functions and, inversely, a GO profiling tool aimed at deciphering evolutionary histories of molecular functions, processes or cell components. In addition to the re-designed website, the OrthoInspector resource now provides a REST interface for programmatic access. OrthoInspector 3.0 is available at http://lbgi.fr/orthoinspectorv3.
Collapse
Affiliation(s)
- Yannis Nevers
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Arnaud Kress
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Audrey Defosset
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Raymond Ripp
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Benjamin Linard
- LIRMM, Univ Montpellier, CNRS, Montpellier, France.,ISEM, Univ Montpellier, CNRS, IRD, EPHE, CIRAD, INRAP, Montpellier, France.,AGAP, Univ Montpellier, CIRAD, INRA, Montpellier Supagro, Montpellier, France
| | - Julie D Thompson
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| |
Collapse
|
4
|
Schiffer PH, Danchin EGJ, Burnell AM, Creevey CJ, Wong S, Dix I, O'Mahony G, Culleton BA, Rancurel C, Stier G, Martínez-Salazar EA, Marconi A, Trivedi U, Kroiher M, Thorne MAS, Schierenberg E, Wiehe T, Blaxter M. Signatures of the Evolution of Parthenogenesis and Cryptobiosis in the Genomes of Panagrolaimid Nematodes. iScience 2019; 21:587-602. [PMID: 31759330 PMCID: PMC6889759 DOI: 10.1016/j.isci.2019.10.039] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 07/17/2019] [Accepted: 10/21/2019] [Indexed: 12/12/2022] Open
Abstract
Most animal species reproduce sexually and fully parthenogenetic lineages are usually short lived in evolution. Still, parthenogenesis may be advantageous as it avoids the cost of sex and permits colonization by single individuals. Panagrolaimid nematodes have colonized environments ranging from arid deserts to Arctic and Antarctic biomes. Many are obligatory meiotic parthenogens, and most have cryptobiotic abilities, being able to survive repeated cycles of complete desiccation and freezing. To identify systems that may contribute to these striking abilities, we sequenced and compared the genomes and transcriptomes of parthenogenetic and outcrossing panagrolaimid species, including cryptobionts and non-cryptobionts. The parthenogens are triploids, most likely originating through hybridization. Adaptation to cryptobiosis shaped the genomes of panagrolaimid nematodes and is associated with the expansion of gene families and signatures of selection on genes involved in cryptobiosis. All panagrolaimids have acquired genes through horizontal gene transfer, some of which are likely to contribute to cryptobiosis.
Collapse
Affiliation(s)
- Philipp H Schiffer
- CLOE, Department for Biosciences, University College London, London, UK; Zoologisches Institut, Universität zu Köln, 50674 Köln, Germany; Institut für Genetik, Universität zu Köln, 50674 Köln, Germany.
| | | | - Ann M Burnell
- Maynooth University Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland
| | | | - Simon Wong
- Irish Centre for High-End Computing, Tower Building, Trinity Technology & Enterprise Campus, Grand Canal Quay, Dublin D02 HP83, Ireland
| | - Ilona Dix
- Maynooth University Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland
| | - Georgina O'Mahony
- Maynooth University Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland
| | - Bridget A Culleton
- Maynooth University Department of Biology, National University of Ireland Maynooth, Maynooth, Co. Kildare, Ireland; Megazyme, Bray Business Park, Bray, Co. Wicklow A98 YV29, Ireland
| | | | - Gary Stier
- Zoologisches Institut, Universität zu Köln, 50674 Köln, Germany
| | - Elizabeth A Martínez-Salazar
- Unidad Académica de Ciencias Biológicas, Laboratorio de Colecciones Biológicas y Sistemática Molecular, Universidad Autónoma de Zacatecas, Zacatecas, México
| | - Aleksandra Marconi
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Urmi Trivedi
- Edinburgh Genomics, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, UK
| | - Michael Kroiher
- Zoologisches Institut, Universität zu Köln, 50674 Köln, Germany
| | - Michael A S Thorne
- British Antarctic Survey, Natural Environment Research Council, High Cross, Madingley Road, Cambridge CB3 0ET, UK
| | | | - Thomas Wiehe
- Institut für Genetik, Universität zu Köln, 50674 Köln, Germany
| | - Mark Blaxter
- Institute of Evolutionary Biology, The University of Edinburgh, Edinburgh EH9 3FL, UK; Edinburgh Genomics, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, UK
| |
Collapse
|
5
|
Drukewitz SH, von Reumont BM. The Significance of Comparative Genomics in Modern Evolutionary Venomics. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00163] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
|
6
|
Abstract
The distinction between orthologs and paralogs, genes that started diverging by speciation versus duplication, is relevant in a wide range of contexts, most notably phylogenetic tree inference and protein function annotation. In this chapter, we provide an overview of the methods used to infer orthology and paralogy. We survey both graph-based approaches (and their various grouping strategies) and tree-based approaches, which solve the more general problem of gene/species tree reconciliation. We discuss conceptual differences among the various orthology inference methods and databases and examine the difficult issue of verifying and benchmarking orthology predictions. Finally, we review typical applications of orthologous genes, groups, and reconciled trees and conclude with thoughts on future methodological developments.
Collapse
|
7
|
OrthoList 2: A New Comparative Genomic Analysis of Human and Caenorhabditis elegans Genes. Genetics 2018; 210:445-461. [PMID: 30120140 DOI: 10.1534/genetics.118.301307] [Citation(s) in RCA: 184] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 08/15/2018] [Indexed: 11/18/2022] Open
Abstract
OrthoList, a compendium of Caenorhabditis elegans genes with human orthologs compiled in 2011 by a meta-analysis of four orthology-prediction methods, has been a popular tool for identifying conserved genes for research into biological and disease mechanisms. However, the efficacy of orthology prediction depends on the accuracy of gene-model predictions, an ongoing process, and orthology-prediction algorithms have also been updated over time. Here we present OrthoList 2 (OL2), a new comparative genomic analysis between C. elegans and humans, and the first assessment of how changes over time affect the landscape of predicted orthologs between two species. Although we find that updates to the orthology-prediction methods significantly changed the landscape of C. elegans-human orthologs predicted by individual programs and-unexpectedly-reduced agreement among them, we also show that our meta-analysis approach "buffered" against changes in gene content. We show that adding results from more programs did not lead to many additions to the list and discuss reasons to avoid assigning "scores" based on support by individual orthology-prediction programs; the treatment of "legacy" genes no longer predicted by these programs; and the practical difficulties of updating due to encountering deprecated, changed, or retired gene identifiers. In addition, we consider what other criteria may support claims of orthology and alternative approaches to find potential orthologs that elude identification by these programs. Finally, we created a new web-based tool that allows for rapid searches of OL2 by gene identifiers, protein domains [InterPro and SMART (Simple Modular Architecture Research Tool], or human disease associations ([OMIM (Online Mendelian Inheritence in Man], and also includes available RNA-interference resources to facilitate potential translational cross-species studies.
Collapse
|
8
|
Nevers Y, Prasad MK, Poidevin L, Chennen K, Allot A, Kress A, Ripp R, Thompson JD, Dollfus H, Poch O, Lecompte O. Insights into Ciliary Genes and Evolution from Multi-Level Phylogenetic Profiling. Mol Biol Evol 2018; 34:2016-2034. [PMID: 28460059 PMCID: PMC5850483 DOI: 10.1093/molbev/msx146] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Cilia (flagella) are important eukaryotic organelles, present in the Last Eukaryotic Common Ancestor, and are involved in cell motility and integration of extracellular signals. Ciliary dysfunction causes a class of genetic diseases, known as ciliopathies, however current knowledge of the underlying mechanisms is still limited and a better characterization of genes is needed. As cilia have been lost independently several times during evolution and they are subject to important functional variation between species, ciliary genes can be investigated through comparative genomics. We performed phylogenetic profiling by predicting orthologs of human protein-coding genes in 100 eukaryotic species. The analysis integrated three independent methods to predict a consensus set of 274 ciliary genes, including 87 new promising candidates. A fine-grained analysis of the phylogenetic profiles allowed a partitioning of ciliary genes into modules with distinct evolutionary histories and ciliary functions (assembly, movement, centriole, etc.) and thus propagation of potential annotations to previously undocumented genes. The cilia/basal body localization was experimentally confirmed for five of these previously unannotated proteins (LRRC23, LRRC34, TEX9, WDR27, and BIVM), validating the relevance of our approach. Furthermore, our multi-level analysis sheds light on the core gene sets retained in gamete-only flagellates or Ecdysozoa for instance. By combining gene-centric and species-oriented analyses, this work reveals new ciliary and ciliopathy gene candidates and provides clues about the evolution of ciliary processes in the eukaryotic domain. Additionally, the positive and negative reference gene sets and the phylogenetic profile of human genes constructed during this study can be exploited in future work.
Collapse
Affiliation(s)
- Yannis Nevers
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, Fédération de Médecine Translationnelle, Strasbourg, France
| | - Megana K Prasad
- Laboratoire de Génétique Médicale, Institut de Génétique Médicale d'Alsace, INSERM U1112, Université de Strasbourg, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Strasbourg, France
| | - Laetitia Poidevin
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, Fédération de Médecine Translationnelle, Strasbourg, France
| | - Kirsley Chennen
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, Fédération de Médecine Translationnelle, Strasbourg, France
| | - Alexis Allot
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, Fédération de Médecine Translationnelle, Strasbourg, France
| | - Arnaud Kress
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, Fédération de Médecine Translationnelle, Strasbourg, France
| | - Raymond Ripp
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, Fédération de Médecine Translationnelle, Strasbourg, France
| | - Julie D Thompson
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, Fédération de Médecine Translationnelle, Strasbourg, France
| | - Hélène Dollfus
- Laboratoire de Génétique Médicale, Institut de Génétique Médicale d'Alsace, INSERM U1112, Université de Strasbourg, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Strasbourg, France.,Centre de Référence pour les Affections Rares en Génétique Ophtalmologique, Service de Génétique Médicale, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, Fédération de Médecine Translationnelle, Strasbourg, France
| | - Odile Lecompte
- Complex Systems and Translational Bioinformatics, ICube UMR 7357, Université de Strasbourg, Fédération de Médecine Translationnelle, Strasbourg, France
| |
Collapse
|
9
|
Monk J, Bosi E. Integration of Comparative Genomics with Genome-Scale Metabolic Modeling to Investigate Strain-Specific Phenotypical Differences. Methods Mol Biol 2018; 1716:151-175. [PMID: 29222753 DOI: 10.1007/978-1-4939-7528-0_7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Genome-scale metabolic reconstructions are powerful resources that allow translation biological knowledge and genomic information to phenotypical predictions using a number of constraint-based methods. This approach has been applied in recent years to gain deep insights into the cellular phenotype role of the genes at a systems-level, driving the design of targeted experiments and paving the way for knowledge-based synthetic biology.The identification of genetic determinants underlying the variability at the phenotypical level is crucial to understand the evolutionary trajectories of a bacterial species. Recently, genome-scale metabolic models of different strains have been assembled to highlight the intra-species diversity at the metabolic level. The strain-specific metabolic capabilities and auxotrophies can be used to identify factors related to the lifestyle diversity of a bacterial species.In this chapter, we present the computational steps to perform genome-scale metabolic modeling in the context of comparative genomics, and the different challenges related to this task.
Collapse
Affiliation(s)
- Jonathan Monk
- Department of Bioengineering, University of California, La Jolla, CA, USA
| | - Emanuele Bosi
- Department of Biology, University of Florence, Sesto Fiorentino, Italy.
| |
Collapse
|
10
|
Allot A, Chennen K, Nevers Y, Poidevin L, Kress A, Ripp R, Thompson JD, Poch O, Lecompte O. MyGeneFriends: A Social Network Linking Genes, Genetic Diseases, and Researchers. J Med Internet Res 2017. [PMID: 28623182 PMCID: PMC5493784 DOI: 10.2196/jmir.6676] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background The constant and massive increase of biological data offers unprecedented opportunities to decipher the function and evolution of genes and their roles in human diseases. However, the multiplicity of sources and flow of data mean that efficient access to useful information and knowledge production has become a major challenge. This challenge can be addressed by taking inspiration from Web 2.0 and particularly social networks, which are at the forefront of big data exploration and human-data interaction. Objective MyGeneFriends is a Web platform inspired by social networks, devoted to genetic disease analysis, and organized around three types of proactive agents: genes, humans, and genetic diseases. The aim of this study was to improve exploration and exploitation of biological, postgenomic era big data. Methods MyGeneFriends leverages conventions popularized by top social networks (Facebook, LinkedIn, etc), such as networks of friends, profile pages, friendship recommendations, affinity scores, news feeds, content recommendation, and data visualization. Results MyGeneFriends provides simple and intuitive interactions with data through evaluation and visualization of connections (friendships) between genes, humans, and diseases. The platform suggests new friends and publications and allows agents to follow the activity of their friends. It dynamically personalizes information depending on the user’s specific interests and provides an efficient way to share information with collaborators. Furthermore, the user’s behavior itself generates new information that constitutes an added value integrated in the network, which can be used to discover new connections between biological agents. Conclusions We have developed MyGeneFriends, a Web platform leveraging conventions from popular social networks to redefine the relationship between humans and biological big data and improve human processing of biomedical data. MyGeneFriends is available at lbgi.fr/mygenefriends.
Collapse
Affiliation(s)
- Alexis Allot
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Kirsley Chennen
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Yannis Nevers
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Laetitia Poidevin
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Arnaud Kress
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Raymond Ripp
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Julie Dawn Thompson
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Olivier Poch
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| | - Odile Lecompte
- ICUBE UMR 7357, Complex Systems and Translational Bioinformatics, Université de Strasbourg - CNRS - FMTS, Strasbourg, France
| |
Collapse
|
11
|
Kaduk M, Riegler C, Lemp O, Sonnhammer ELL. HieranoiDB: a database of orthologs inferred by Hieranoid. Nucleic Acids Res 2017; 45:D687-D690. [PMID: 27742821 PMCID: PMC5210627 DOI: 10.1093/nar/gkw923] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 09/30/2016] [Accepted: 10/05/2016] [Indexed: 02/04/2023] Open
Abstract
HieranoiDB (http://hieranoiDB.sbc.su.se) is a freely available on-line database for hierarchical groups of orthologs inferred by the Hieranoid algorithm. It infers orthologs at each node in a species guide tree with the InParanoid algorithm as it progresses from the leaves to the root. Here we present a database HieranoiDB with a web interface that makes it easy to search and visualize the output of Hieranoid, and to download it in various formats. Searching can be performed using protein description, identifier or sequence. In this first version, orthologs are available for the 66 Quest for Orthologs reference proteomes. The ortholog trees are shown graphically and interactively with marked speciation and duplication nodes that show the inferred evolutionary scenario, and allow for correct extraction of predicted orthologs from the Hieranoid trees.
Collapse
Affiliation(s)
- Mateusz Kaduk
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| | - Christian Riegler
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
- FH OÖ - University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Oliver Lemp
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
- FH OÖ - University of Applied Sciences Upper Austria, Hagenberg 4232, Austria
| | - Erik L L Sonnhammer
- Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
12
|
Abstract
Correctly estimating the age of a gene or gene family is important for a variety of fields, including molecular evolution, comparative genomics, and phylogenetics, and increasingly for systems biology and disease genetics. However, most studies use only a point estimate of a gene’s age, neglecting the substantial uncertainty involved in this estimation. Here, we characterize this uncertainty by investigating the effect of algorithm choice on gene-age inference and calculate consensus gene ages with attendant error distributions for a variety of model eukaryotes. We use 13 orthology inference algorithms to create gene-age datasets and then characterize the error around each age-call on a per-gene and per-algorithm basis. Systematic error was found to be a large factor in estimating gene age, suggesting that simple consensus algorithms are not enough to give a reliable point estimate. We also found that different sources of error can affect downstream analyses, such as gene ontology enrichment. Our consensus gene-age datasets, with associated error terms, are made fully available at so that researchers can propagate this uncertainty through their analyses (geneages.org).
Collapse
Affiliation(s)
- Benjamin J Liebeskind
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin Center for Computational Biology and Bioinformatics, University of Texas at Austin
| | - Claire D McWhite
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin
| | - Edward M Marcotte
- Department of Molecular Biosciences, Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin
| |
Collapse
|
13
|
Standardized benchmarking in the quest for orthologs. Nat Methods 2016; 13:425-30. [PMID: 27043882 PMCID: PMC4827703 DOI: 10.1038/nmeth.3830] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 03/09/2016] [Indexed: 11/23/2022]
Abstract
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision–recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.
Collapse
|
14
|
Howe K, Schiffer PH, Zielinski J, Wiehe T, Laird GK, Marioni JC, Soylemez O, Kondrashov F, Leptin M. Structure and evolutionary history of a large family of NLR proteins in the zebrafish. Open Biol 2016; 6:160009. [PMID: 27248802 PMCID: PMC4852459 DOI: 10.1098/rsob.160009] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 04/05/2016] [Indexed: 12/14/2022] Open
Abstract
Multicellular eukaryotes have evolved a range of mechanisms for immune recognition. A widespread family involved in innate immunity are the NACHT-domain and leucine-rich-repeat-containing (NLR) proteins. Mammals have small numbers of NLR proteins, whereas in some species, mostly those without adaptive immune systems, NLRs have expanded into very large families. We describe a family of nearly 400 NLR proteins encoded in the zebrafish genome. The proteins share a defining overall structure, which arose in fishes after a fusion of the core NLR domains with a B30.2 domain, but can be subdivided into four groups based on their NACHT domains. Gene conversion acting differentially on the NACHT and B30.2 domains has shaped the family and created the groups. Evidence of positive selection in the B30.2 domain indicates that this domain rather than the leucine-rich repeats acts as the pathogen recognition module. In an unusual chromosomal organization, the majority of the genes are located on one chromosome arm, interspersed with other large multigene families, including a new family encoding zinc-finger proteins. The NLR-B30.2 proteins represent a new family with diversity in the specific recognition module that is present in fishes in spite of the parallel existence of an adaptive immune system.
Collapse
Affiliation(s)
| | - Philipp H Schiffer
- Institut für Genetik, Universität zu Köln, Köln, Germany The European Molecular Biology Laboratory, Heidelberg, Germany
| | | | - Thomas Wiehe
- Institut für Genetik, Universität zu Köln, Köln, Germany
| | | | - John C Marioni
- Wellcome Trust Sanger Institute, Cambridge, UK The European Molecular Biology Laboratory, The European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Onuralp Soylemez
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) 88 Dr. Aiguader, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Fyodor Kondrashov
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) 88 Dr. Aiguader, 08003 Barcelona, Spain Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain Institució Catalana de Recerca i Estudis Avançats (ICREA), 23 Pg. Lluís Companys, 08010 Barcelona, Spain
| | - Maria Leptin
- Institut für Genetik, Universität zu Köln, Köln, Germany The European Molecular Biology Laboratory, Heidelberg, Germany
| |
Collapse
|
15
|
Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, Rattei T, Mende DR, Sunagawa S, Kuhn M, Jensen LJ, von Mering C, Bork P. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 2016; 44:D286-93. [PMID: 26582926 PMCID: PMC4702882 DOI: 10.1093/nar/gkv1248] [Citation(s) in RCA: 1387] [Impact Index Per Article: 173.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Revised: 10/30/2015] [Accepted: 11/02/2015] [Indexed: 01/19/2023] Open
Abstract
eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing overall annotation coverage from 67% to 72%. The functional annotations of OGs have been expanded to also provide Gene Ontology terms, KEGG pathways and SMART/Pfam domains for each group. Moreover, eggNOG now provides pairwise orthology relationships within OGs based on analysis of phylogenetic trees. We have also incorporated a framework for quickly mapping novel sequences to OGs based on precomputed HMM profiles. Finally, eggNOG version 4.5 incorporates a novel data set spanning 2605 viral OGs, covering 5228 proteins from 352 viral proteomes. All data are accessible for bulk downloading, as a web-service, and through a completely redesigned web interface. The new access points provide faster searches and a number of new browsing and visualization capabilities, facilitating the needs of both experts and less experienced users. eggNOG v4.5 is available at http://eggnog.embl.de.
Collapse
Affiliation(s)
- Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences, University of Zurich, Zurich 8057, Switzerland Bioinformatics/Systems Biology Group, Swiss Institute of Bioinformatics (SIB), Zurich 8057, Switzerland
| | - Kristoffer Forslund
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Helen Cook
- The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N 2200, Denmark
| | - Davide Heller
- Institute of Molecular Life Sciences, University of Zurich, Zurich 8057, Switzerland Bioinformatics/Systems Biology Group, Swiss Institute of Bioinformatics (SIB), Zurich 8057, Switzerland
| | - Mathias C Walter
- Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health (GmbH), Neuherberg 85764, Germany
| | - Thomas Rattei
- CUBE-Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna 1090, Austria
| | - Daniel R Mende
- Daniel K. Inouye Center for Microbial Oceanography: Research and Education, University of Hawaii, Honolulu, HI 96822, USA
| | - Shinichi Sunagawa
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Michael Kuhn
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden 01307, Germany
| | - Lars Juhl Jensen
- The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N 2200, Denmark
| | - Christian von Mering
- Institute of Molecular Life Sciences, University of Zurich, Zurich 8057, Switzerland Bioinformatics/Systems Biology Group, Swiss Institute of Bioinformatics (SIB), Zurich 8057, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany Germany Molecular Medicine Partnership Unit (MMPU), University Hospital Heidelberg and European Molecular Biology Laboratory, Heidelberg 69117, Germany Max Delbrück Centre for Molecular Medicine, Berlin 13125, Germany
| |
Collapse
|