1
|
Koper K, Han SW, Kothadia R, Salamon H, Yoshikuni Y, Maeda HA. Multisubstrate specificity shaped the complex evolution of the aminotransferase family across the tree of life. Proc Natl Acad Sci U S A 2024; 121:e2405524121. [PMID: 38885378 PMCID: PMC11214133 DOI: 10.1073/pnas.2405524121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 05/14/2024] [Indexed: 06/20/2024] Open
Abstract
Aminotransferases (ATs) are an ancient enzyme family that play central roles in core nitrogen metabolism, essential to all organisms. However, many of the AT enzyme functions remain poorly defined, limiting our fundamental understanding of the nitrogen metabolic networks that exist in different organisms. Here, we traced the deep evolutionary history of the AT family by analyzing AT enzymes from 90 species spanning the tree of life (ToL). We found that each organism has maintained a relatively small and constant number of ATs. Mapping the distribution of ATs across the ToL uncovered that many essential AT reactions are carried out by taxon-specific AT enzymes due to wide-spread nonorthologous gene displacements. This complex evolutionary history explains the difficulty of homology-based AT functional prediction. Biochemical characterization of diverse aromatic ATs further revealed their broad substrate specificity, unlike other core metabolic enzymes that evolved to catalyze specific reactions today. Interestingly, however, we found that these AT enzymes that diverged over billion years share common signatures of multisubstrate specificity by employing different nonconserved active site residues. These findings illustrate that AT family enzymes had leveraged their inherent substrate promiscuity to maintain a small yet distinct set of multifunctional AT enzymes in different taxa. This evolutionary history of versatile ATs likely contributed to the establishment of robust and diverse nitrogen metabolic networks that exist throughout the ToL. The study provides a critical foundation to systematically determine diverse AT functions and underlying nitrogen metabolic networks across the ToL.
Collapse
Affiliation(s)
- Kaan Koper
- Department of Botany, University of Wisconsin-Madison, Madison, WI53706
| | - Sang-Woo Han
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
- Department of Biotechnology, Konkuk University, Chungju27478, South Korea
| | - Ramani Kothadia
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Hugh Salamon
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Yasuo Yoshikuni
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA94720
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA94720
- Center for Advanced Bioenergy and Bioproducts Innovation, Lawrence Berkeley National Laboratory, Berkeley, CA94720
- Global Center for Food, Land, and Water Resources, Research Faculty of Agriculture, Hokkaido University, Hokkaido, Japan 060-8589
- Institute of Global Innovation Research, Tokyo University of Agriculture and Technology, Tokyo183-8538, Japan
| | - Hiroshi A. Maeda
- Department of Botany, University of Wisconsin-Madison, Madison, WI53706
| |
Collapse
|
2
|
Zhang P, Zhang B, Ji Y, Jiao J, Zhang Z, Tian C. Cofitness network connectivity determines a fuzzy essential zone in open bacterial pangenome. MLIFE 2024; 3:277-290. [PMID: 38948139 PMCID: PMC11211677 DOI: 10.1002/mlf2.12132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/20/2024] [Accepted: 04/24/2024] [Indexed: 07/02/2024]
Abstract
Most in silico evolutionary studies commonly assumed that core genes are essential for cellular function, while accessory genes are dispensable, particularly in nutrient-rich environments. However, this assumption is seldom tested genetically within the pangenome context. In this study, we conducted a robust pangenomic Tn-seq analysis of fitness genes in a nutrient-rich medium for Sinorhizobium strains with a canonical open pangenome. To evaluate the robustness of fitness category assignment, Tn-seq data for three independent mutant libraries per strain were analyzed by three methods, which indicates that the Hidden Markov Model (HMM)-based method is most robust to variations between mutant libraries and not sensitive to data size, outperforming the Bayesian and Monte Carlo simulation-based methods. Consequently, the HMM method was used to classify the fitness category. Fitness genes, categorized as essential (ES), advantage (GA), and disadvantage (GD) genes for growth, are enriched in core genes, while nonessential genes (NE) are over-represented in accessory genes. Accessory ES/GA genes showed a lower fitness effect than core ES/GA genes. Connectivity degrees in the cofitness network decrease in the order of ES, GD, and GA/NE. In addition to accessory genes, 1599 out of 3284 core genes display differential essentiality across test strains. Within the pangenome core, both shared quasi-essential (ES and GA) and strain-dependent fitness genes are enriched in similar functional categories. Our analysis demonstrates a considerable fuzzy essential zone determined by cofitness connectivity degrees in Sinorhizobium pangenome and highlights the power of the cofitness network in understanding the genetic basis of ever-increasing prokaryotic pangenome data.
Collapse
Affiliation(s)
- Pan Zhang
- State Key Laboratory of Plant Environmental Resilience, and College of Biological SciencesChina Agricultural UniversityBeijingChina
- MOA Key Laboratory of Soil Microbiology, and Rhizobium Research CenterChina Agricultural UniversityBeijingChina
- Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced TechnologyChinese Academy of SciencesShenzhenChina
| | - Biliang Zhang
- MOA Key Laboratory of Soil Microbiology, and Rhizobium Research CenterChina Agricultural UniversityBeijingChina
- State Key Laboratory of Livestock and Poultry Biotechnology Breeding, and College of Biological SciencesChina Agricultural UniversityBeijingChina
| | - Yuan‐Yuan Ji
- State Key Laboratory of Plant Environmental Resilience, and College of Biological SciencesChina Agricultural UniversityBeijingChina
- MOA Key Laboratory of Soil Microbiology, and Rhizobium Research CenterChina Agricultural UniversityBeijingChina
| | - Jian Jiao
- State Key Laboratory of Plant Environmental Resilience, and College of Biological SciencesChina Agricultural UniversityBeijingChina
- MOA Key Laboratory of Soil Microbiology, and Rhizobium Research CenterChina Agricultural UniversityBeijingChina
| | - Ziding Zhang
- State Key Laboratory of Livestock and Poultry Biotechnology Breeding, and College of Biological SciencesChina Agricultural UniversityBeijingChina
| | - Chang‐Fu Tian
- State Key Laboratory of Plant Environmental Resilience, and College of Biological SciencesChina Agricultural UniversityBeijingChina
- MOA Key Laboratory of Soil Microbiology, and Rhizobium Research CenterChina Agricultural UniversityBeijingChina
| |
Collapse
|
3
|
Mutwil M, Fernie AR. Ancestral genome reconstruction for studies of the green lineage. MOLECULAR PLANT 2023; 16:657-659. [PMID: 36871157 DOI: 10.1016/j.molp.2023.03.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 02/23/2023] [Accepted: 03/02/2023] [Indexed: 06/18/2023]
Affiliation(s)
- Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore.
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany.
| |
Collapse
|
4
|
Reconstruction of hundreds of reference ancestral genomes across the eukaryotic kingdom. Nat Ecol Evol 2023; 7:355-366. [PMID: 36646945 PMCID: PMC9998269 DOI: 10.1038/s41559-022-01956-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 11/22/2022] [Indexed: 01/18/2023]
Abstract
Ancestral sequence reconstruction is a fundamental aspect of molecular evolution studies and can trace small-scale sequence modifications through the evolution of genomes and species. In contrast, fine-grained reconstructions of ancestral genome organizations are still in their infancy, limiting our ability to draw comprehensive views of genome and karyotype evolution. Here we reconstruct the detailed gene contents and organizations of 624 ancestral vertebrate, plant, fungi, metazoan and protist genomes, 183 of which are near-complete chromosomal gene order reconstructions. Reconstructed ancestral genomes are similar to their descendants in terms of gene content as expected and agree precisely with reference cytogenetic and in silico reconstructions when available. By comparing successive ancestral genomes along the phylogenetic tree, we estimate the intra- and interchromosomal rearrangement history of all major vertebrate clades at high resolution. This freely available resource introduces the possibility to follow evolutionary processes at genomic scales in chronological order, across multiple clades and without relying on a single extant species as reference.
Collapse
|
5
|
Koper K, Han SW, Pastor DC, Yoshikuni Y, Maeda HA. Evolutionary Origin and Functional Diversification of Aminotransferases. J Biol Chem 2022; 298:102122. [PMID: 35697072 PMCID: PMC9309667 DOI: 10.1016/j.jbc.2022.102122] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 06/06/2022] [Accepted: 06/07/2022] [Indexed: 11/30/2022] Open
Abstract
Aminotransferases (ATs) are pyridoxal 5′-phosphate–dependent enzymes that catalyze the transamination reactions between amino acid donor and keto acid acceptor substrates. Modern AT enzymes constitute ∼2% of all classified enzymatic activities, play central roles in nitrogen metabolism, and generate multitude of primary and secondary metabolites. ATs likely diverged into four distinct AT classes before the appearance of the last universal common ancestor and further expanded to a large and diverse enzyme family. Although the AT family underwent an extensive functional specialization, many AT enzymes retained considerable substrate promiscuity and multifunctionality because of their inherent mechanistic, structural, and functional constraints. This review summarizes the evolutionary history, diverse metabolic roles, reaction mechanisms, and structure–function relationships of the AT family enzymes, with a special emphasis on their substrate promiscuity and multifunctionality. Comprehensive characterization of AT substrate specificity is still needed to reveal their true metabolic functions in interconnecting various branches of the nitrogen metabolic network in different organisms.
Collapse
Affiliation(s)
- Kaan Koper
- Department of Botany, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Sang-Woo Han
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - Yasuo Yoshikuni
- The US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Global Center for Food, Land, and Water Resources, Research Faculty of Agriculture, Hokkaido University, Hokkaido 060-8589, Japan
| | - Hiroshi A Maeda
- Department of Botany, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
6
|
Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou L, Mi H. PANTHER: Making genome-scale phylogenetics accessible to all. Protein Sci 2022; 31:8-22. [PMID: 34717010 PMCID: PMC8740835 DOI: 10.1002/pro.4218] [Citation(s) in RCA: 501] [Impact Index Per Article: 250.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 10/24/2021] [Accepted: 10/26/2021] [Indexed: 02/03/2023]
Abstract
Phylogenetics is a powerful tool for analyzing protein sequences, by inferring their evolutionary relationships to other proteins. However, phylogenetics analyses can be challenging: they are computationally expensive and must be performed carefully in order to avoid systematic errors and artifacts. Protein Analysis THrough Evolutionary Relationships (PANTHER; http://pantherdb.org) is a publicly available, user-focused knowledgebase that stores the results of an extensive phylogenetic reconstruction pipeline that includes computational and manual processes and quality control steps. First, fully reconciled phylogenetic trees (including ancestral protein sequences) are reconstructed for a set of "reference" protein sequences obtained from fully sequenced genomes of organisms across the tree of life. Second, the resulting phylogenetic trees are manually reviewed and annotated with function evolution events: inferred gains and losses of protein function along branches of the phylogenetic tree. Here, we describe in detail the current contents of PANTHER, how those contents are generated, and how they can be used in a variety of applications. The PANTHER knowledgebase can be downloaded or accessed via an extensive API. In addition, PANTHER provides software tools to facilitate the application of the knowledgebase to common protein sequence analysis tasks: exploring an annotated genome by gene function; performing "enrichment analysis" of lists of genes; annotating a single sequence or large batch of sequences by homology; and assessing the likelihood that a genetic variant at a particular site in a protein will have deleterious effects.
Collapse
Affiliation(s)
- Paul D. Thomas
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Dustin Ebert
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Tremayne Mushayahama
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Laurent‐Philippe Albou
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| |
Collapse
|
7
|
Psomopoulos FE, van Helden J, Médigue C, Chasapi A, Ouzounis CA. Ancestral state reconstruction of metabolic pathways across pangenome ensembles. Microb Genom 2021; 6. [PMID: 32924924 PMCID: PMC7725326 DOI: 10.1099/mgen.0.000429] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
As genome sequencing efforts are unveiling the genetic diversity of the biosphere with an unprecedented speed, there is a need to accurately describe the structural and functional properties of groups of extant species whose genomes have been sequenced, as well as their inferred ancestors, at any given taxonomic level of their phylogeny. Elaborate approaches for the reconstruction of ancestral states at the sequence level have been developed, subsequently augmented by methods based on gene content. While these approaches of sequence or gene-content reconstruction have been successfully deployed, there has been less progress on the explicit inference of functional properties of ancestral genomes, in terms of metabolic pathways and other cellular processes. Herein, we describe PathTrace, an efficient algorithm for parsimony-based reconstructions of the evolutionary history of individual metabolic pathways, pivotal representations of key functional modules of cellular function. The algorithm is implemented as a five-step process through which pathways are represented as fuzzy vectors, where each enzyme is associated with a taxonomic conservation value derived from the phylogenetic profile of its protein sequence. The method is evaluated with a selected benchmark set of pathways against collections of genome sequences from key data resources. By deploying a pangenome-driven approach for pathway sets, we demonstrate that the inferred patterns are largely insensitive to noise, as opposed to gene-content reconstruction methods. In addition, the resulting reconstructions are closely correlated with the evolutionary distance of the taxa under study, suggesting that a diligent selection of target pangenomes is essential for maintaining cohesiveness of the method and consistency of the inference, serving as an internal control for an arbitrary selection of queries. The PathTrace method is a first step towards the large-scale analysis of metabolic pathway evolution and our deeper understanding of functional relationships reflected in emerging pangenome collections.
Collapse
Affiliation(s)
- Fotis E Psomopoulos
- Institute of Applied Biosciences (INAB), Center for Research & Technology Hellas (CERTH), GR-57001 Thessalonica, Greece
| | - Jacques van Helden
- Lab. Technological Advances for Genomics & Clinics (TAGC), Université d'Aix-Marseille (AMU), INSERM Unit U1090, 163, Avenue de Luminy, 13288 Marseille cedex 09, France
| | - Claudine Médigue
- UMR 8030, CNRS, Université Evry-Val-d'Essonne, CEA, Institut de Biologie François Jacob - Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, Evry, France
| | - Anastasia Chasapi
- Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Center for Research & Technology Hellas (CERTH), GR-57001 Thessalonica, Greece
| | - Christos A Ouzounis
- Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Center for Research & Technology Hellas (CERTH), GR-57001 Thessalonica, Greece
| |
Collapse
|
8
|
Linard B, Ebersberger I, McGlynn SE, Glover N, Mochizuki T, Patricio M, Lecompte O, Nevers Y, Thomas PD, Gabaldón T, Sonnhammer E, Dessimoz C, Uchiyama I. Ten Years of Collaborative Progress in the Quest for Orthologs. Mol Biol Evol 2021; 38:3033-3045. [PMID: 33822172 PMCID: PMC8321534 DOI: 10.1093/molbev/msab098] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 02/07/2021] [Accepted: 04/01/2021] [Indexed: 12/19/2022] Open
Abstract
Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit-from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.
Collapse
Affiliation(s)
- Benjamin Linard
- LIRMM, University of Montpellier, CNRS, Montpellier, France.,SPYGEN, Le Bourget-du-Lac, France
| | - Ingo Ebersberger
- Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (S-BIKF), Frankfurt, Germany.,LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt, Germany
| | - Shawn E McGlynn
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan.,Blue Marble Space Institute of Science, Seattle, WA, USA
| | - Natasha Glover
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Tomohiro Mochizuki
- Earth-Life Science Institute, Tokyo Institute of Technology, Meguro, Tokyo, Japan
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Yannis Nevers
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BCS-CNS), Jordi Girona, Barcelona, Spain.,Institute for Research in Biomedicine (IRB), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Department of Computer Science, University College London, London, United Kingdom.,Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ikuo Uchiyama
- Department of Theoretical Biology, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi, Japan
| | | |
Collapse
|
9
|
Deutekom ES, Snel B, van Dam TJP. Benchmarking orthology methods using phylogenetic patterns defined at the base of Eukaryotes. Brief Bioinform 2020; 22:5906198. [PMID: 32935832 PMCID: PMC8138875 DOI: 10.1093/bib/bbaa206] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 08/10/2020] [Accepted: 08/11/2020] [Indexed: 12/26/2022] Open
Abstract
Insights into the evolution of ancestral complexes and pathways are generally achieved through careful and time-intensive manual analysis often using phylogenetic profiles of the constituent proteins. This manual analysis limits the possibility of including more protein-complex components, repeating the analyses for updated genome sets or expanding the analyses to larger scales. Automated orthology inference should allow such large-scale analyses, but substantial differences between orthologous groups generated by different approaches are observed. We evaluate orthology methods for their ability to recapitulate a number of observations that have been made with regard to genome evolution in eukaryotes. Specifically, we investigate phylogenetic profile similarity (co-occurrence of complexes), the last eukaryotic common ancestor’s gene content, pervasiveness of gene loss and the overlap with manually determined orthologous groups. Moreover, we compare the inferred orthologies to each other. We find that most orthology methods reconstruct a large last eukaryotic common ancestor, with substantial gene loss, and can predict interacting proteins reasonably well when applying phylogenetic co-occurrence. At the same time, derived orthologous groups show imperfect overlap with manually curated orthologous groups. There is no strong indication of which orthology method performs better than another on individual or all of these aspects. Counterintuitively, despite the orthology methods behaving similarly regarding large-scale evaluation, the obtained orthologous groups differ vastly from one another. Availability and implementation The data and code underlying this article are available in github and/or upon reasonable request to the corresponding author: https://github.com/ESDeutekom/ComparingOrthologies.
Collapse
Affiliation(s)
| | - Berend Snel
- Corresponding author: Berend Snel, Padualaan 8, 358CH Utrecht, The Netherlands. Tel.: +31(0)30 253 8102; E-mail:
| | | |
Collapse
|
10
|
Nagy LG, Merényi Z, Hegedüs B, Bálint B. Novel phylogenetic methods are needed for understanding gene function in the era of mega-scale genome sequencing. Nucleic Acids Res 2020; 48:2209-2219. [PMID: 31943056 PMCID: PMC7049691 DOI: 10.1093/nar/gkz1241] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/15/2019] [Accepted: 12/31/2019] [Indexed: 12/21/2022] Open
Abstract
Ongoing large-scale genome sequencing projects are forecasting a data deluge that will almost certainly overwhelm current analytical capabilities of evolutionary genomics. In contrast to population genomics, there are no standardized methods in evolutionary genomics for extracting evolutionary and functional (e.g. gene-trait association) signal from genomic data. Here, we examine how current practices of multi-species comparative genomics perform in this aspect and point out that many genomic datasets are under-utilized due to the lack of powerful methodologies. As a result, many current analyses emphasize gene families for which some functional data is already available, resulting in a growing gap between functionally well-characterized genes/organisms and the universe of unknowns. This leaves unknown genes on the 'dark side' of genomes, a problem that will not be mitigated by sequencing more and more genomes, unless we develop tools to infer functional hypotheses for unknown genes in a systematic manner. We provide an inventory of recently developed methods capable of predicting gene-gene and gene-trait associations based on comparative data, then argue that realizing the full potential of whole genome datasets requires the integration of phylogenetic comparative methods into genomics, a rich but underutilized toolbox for looking into the past.
Collapse
Affiliation(s)
- László G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| | - Balázs Bálint
- Synthetic and Systems Biology Unit, Institute of Biochemistry, Biological Research Centre, Temesvari krt 62. Szeged 6726, Hungary
| |
Collapse
|
11
|
Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, da Silva AS, Wang Y, Sonnhammer E, Thomas PD. Advances and Applications in the Quest for Orthologs. Mol Biol Evol 2020; 36:2157-2164. [PMID: 31241141 PMCID: PMC6759064 DOI: 10.1093/molbev/msz150] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Gene families evolve by the processes of speciation (creating orthologs), gene duplication (paralogs), and horizontal gene transfer (xenologs), in addition to sequence divergence and gene loss. Orthologs in particular play an essential role in comparative genomics and phylogenomic analyses. With the continued sequencing of organisms across the tree of life, the data are available to reconstruct the unique evolutionary histories of tens of thousands of gene families. Accurate reconstruction of these histories, however, is a challenging computational problem, and the focus of the Quest for Orthologs Consortium. We review the recent advances and outstanding challenges in this field, as revealed at a symposium and meeting held at the University of Southern California in 2017. Key advances have been made both at the level of orthology algorithm development and with respect to coordination across the community of algorithm developers and orthology end-users. Applications spanned a broad range, including gene function prediction, phylostratigraphy, genome evolution, and phylogenomics. The meetings highlighted the increasing use of meta-analyses integrating results from multiple different algorithms, and discussed ongoing challenges in orthology inference as well as the next steps toward improvement and integration of orthology resources.
Collapse
Affiliation(s)
- Natasha Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution & Environment, University College London, London, United Kingdom.,Department of Computer Science, University College London, London, United Kingdom
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (BIK-F), Frankfurt, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
| | - Sofia K Forslund
- Experimental and Clinical Research Center, A Cooperation of Charité-Universitätsmedizin Berlin and Max Delbruck Center for Molecular Medicine, Berlin, Germany.,Max Delbruck Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany.,Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität u Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Toni Gabaldón
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,ICREA, Barcelona, Spain
| | - Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Cécile Pereira
- Eura Nova, Marseille, France.,Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL
| | - Alan Sousa da Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Yan Wang
- Department of Microbiology and Plant Pathology, Institute for Integrative Genome Biology, University of California-Riverside, Riverside, CA
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| |
Collapse
|
12
|
Rigden DJ, Fernández X. The 26th annual Nucleic Acids Research database issue and Molecular Biology Database Collection. Nucleic Acids Res 2019; 47:D1-D7. [PMID: 30626175 PMCID: PMC6323895 DOI: 10.1093/nar/gky1267] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The 2019 Nucleic Acids Research (NAR) Database Issue contains 168 papers spanning molecular biology. Among them, 64 are new and another 92 are updates describing resources that appeared in the Issue previously. The remaining 12 are updates on databases most recently published elsewhere. This Issue contains two Breakthrough articles, on the Virtual Metabolic Human (VMH) database which links human and gut microbiota metabolism with diet and disease, and Vibrism DB, a database of mouse brain anatomy and gene (co-)expression with sophisticated visualization and session sharing. Major returning nucleic acid databases include RNAcentral, miRBase and LncRNA2Target. Protein sequence databases include UniProtKB, InterPro and Pfam, while wwPDB and RCSB cover protein structure. STRING and KEGG update in the section on metabolism and pathways. Microbial genomes are covered by IMG/M and resources for human and model organism genomics include Ensembl, UCSC Genome Browser, GENCODE and Flybase. Genomic variation and disease are well-covered by GWAS Catalog, PopHumanScan, OMIM and COSMIC, CADD being another major newcomer. Major new proteomics resources reporting here include iProX and jPOSTdb. The entire database issue is freely available online on the NAR website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 506 entries, adding 66 new resources and eliminating 147 discontinued URLs, bringing the current total to 1613 databases. It is available at http://www.oxfordjournals.org/nar/database/c.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|