1
|
Martínez-Redondo GI, Vargas-Chávez C, Eleftheriadi K, Benítez-Álvarez L, Vázquez-Valls M, Fernández R. MATEdb2, a Collection of High-Quality Metazoan Proteomes across the Animal Tree of Life to Speed Up Phylogenomic Studies. Genome Biol Evol 2024; 16:evae235. [PMID: 39540856 PMCID: PMC11534026 DOI: 10.1093/gbe/evae235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/17/2024] [Indexed: 11/16/2024] Open
Abstract
Recent advances in high-throughput sequencing have exponentially increased the number of genomic data available for animals (Metazoa) in the last decades, with high-quality chromosome-level genomes being published almost daily. Nevertheless, generating a new genome is not an easy task due to the high cost of genome sequencing, the high complexity of assembly, and the lack of standardized protocols for genome annotation. The lack of consensus in the annotation and publication of genome files hinders research by making researchers lose time in reformatting the files for their purposes but can also reduce the quality of the genetic repertoire for an evolutionary study. Thus, the use of transcriptomes obtained using the same pipeline as a proxy for the genetic content of species remains a valuable resource that is easier to obtain, cheaper, and more comparable than genomes. In a previous study, we presented the Metazoan Assemblies from Transcriptomic Ensembles database (MATEdb), a repository of high-quality transcriptomic and genomic data for the two most diverse animal phyla, Arthropoda and Mollusca. Here, we present the newest version of MATEdb (MATEdb2) that overcomes some of the previous limitations of our database: (i) we include data from all animal phyla where public data are available, and (ii) we provide gene annotations extracted from the original GFF genome files using the same pipeline. In total, we provide proteomes inferred from high-quality transcriptomic or genomic data for almost 1,000 animal species, including the longest isoforms, all isoforms, and functional annotation based on sequence homology and protein language models, as well as the embedding representations of the sequences. We believe this new version of MATEdb will accelerate research on animal phylogenomics while saving thousands of hours of computational work in a plea for open, greener, and collaborative science.
Collapse
Affiliation(s)
- Gemma I Martínez-Redondo
- Metazoa Phylogenomics Lab, Biodiversity Program, Institute of Evolutionary Biology (CSIC-University Pompeu Fabra), 08003 Barcelona, Spain
| | - Carlos Vargas-Chávez
- Metazoa Phylogenomics Lab, Biodiversity Program, Institute of Evolutionary Biology (CSIC-University Pompeu Fabra), 08003 Barcelona, Spain
| | - Klara Eleftheriadi
- Metazoa Phylogenomics Lab, Biodiversity Program, Institute of Evolutionary Biology (CSIC-University Pompeu Fabra), 08003 Barcelona, Spain
| | - Lisandra Benítez-Álvarez
- Metazoa Phylogenomics Lab, Biodiversity Program, Institute of Evolutionary Biology (CSIC-University Pompeu Fabra), 08003 Barcelona, Spain
| | - Marçal Vázquez-Valls
- Metazoa Phylogenomics Lab, Biodiversity Program, Institute of Evolutionary Biology (CSIC-University Pompeu Fabra), 08003 Barcelona, Spain
| | - Rosa Fernández
- Metazoa Phylogenomics Lab, Biodiversity Program, Institute of Evolutionary Biology (CSIC-University Pompeu Fabra), 08003 Barcelona, Spain
| |
Collapse
|
2
|
Sobala ŁF. LukProt: A Database of Eukaryotic Predicted Proteins Designed for Investigations of Animal Origins. Genome Biol Evol 2024; 16:evae231. [PMID: 39431411 PMCID: PMC11534060 DOI: 10.1093/gbe/evae231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/07/2024] [Accepted: 10/11/2024] [Indexed: 10/22/2024] Open
Abstract
The origins and early evolution of animals are subjects with many outstanding questions. One problem faced by researchers trying to answer them is the absence of a comprehensive database with sequences from nonbilaterians. Publicly available data are plentiful but scattered and often not associated with proper metadata. A new database presented in this paper, LukProt, is an attempt at solving this issue. The database contains protein sequences obtained mostly from genomic, transcriptomic, and metagenomic studies and is an extension of EukProt (Richter DJ, Berney C, Strassert JFH, Poh Y-P, Herman EK, Muñoz-Gómez SA, Wideman JG, Burki F, de Vargas C. EukProt: a database of genome-scale predicted proteins across the diversity of eukaryotes. Peer Community J. 2022:2:e56. https://doi.org/10.24072/pcjournal.173). LukProt adopts the EukProt naming conventions and includes data from 216 additional animals. The database is associated with a taxonomic grouping (taxogroup) scheme suitable for studying early animal evolution. Minor updates to the database will contain species additions or metadata corrections, whereas major updates will synchronize LukProt to each new version of EukProt, and releases are permanently stored on Zenodo (https://doi.org/10.5281/zenodo.7089120). A BLAST server to search the database is available at: https://lukprot.hirszfeld.pl/. Users are invited to participate in maintaining and correcting LukProt. As it can be searched without downloading locally, the database aims to be a convenient resource not only for evolutionary biologists, but for the broader scientific community as well.
Collapse
Affiliation(s)
- Łukasz F Sobala
- Laboratory of Glycobiology, Department of Immunochemistry, Hirszfeld Institute of Immunology and Experimental Therapy, PAS, Weigla 12, 53-114 Wrocław, Poland
| |
Collapse
|
3
|
Bricout R, Weil D, Stroebel D, Genovesio A, Roest Crollius H. Evolution is not Uniform Along Coding Sequences. Mol Biol Evol 2023; 40:7060063. [PMID: 36857092 PMCID: PMC10025431 DOI: 10.1093/molbev/msad042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 03/02/2023] Open
Abstract
Amino acids evolve at different speeds within protein sequences, because their functional and structural roles are different. Notably, amino acids located at the surface of proteins are known to evolve more rapidly than those in the core. In particular, amino acids at the N- and C-termini of protein sequences are likely to be more exposed than those at the core of the folded protein due to their location in the peptidic chain, and they are known to be less structured. Because of these reasons, we would expect that amino acids located at protein termini would evolve faster than residues located inside the chain. Here we test this hypothesis and found that amino acids evolve almost twice as fast at protein termini compared with those in the center, hinting at a strong topological bias along the sequence length. We further show that the distribution of solvent-accessible residues and functional domains in proteins readily explain how structural and functional constraints are weaker at their termini, leading to the observed excess of amino acid substitutions. Finally, we show that the specific evolutionary rates at protein termini may have direct consequences, notably misleading in silico methods used to infer sites under positive selection within genes. These results suggest that accounting for positional information should improve evolutionary models.
Collapse
Affiliation(s)
- Raphaël Bricout
- Département de biologie, École normale supérieure, Institut de Biologie de l'ENS (IBENS), CNRS, INSERM, Paris, France
| | - Dominique Weil
- Laboratoire de Biologie du Développement, Sorbonne Université, CNRS, Institut de Biologie Paris-Seine (IBPS), Paris, France
| | - David Stroebel
- Département de biologie, École normale supérieure, Institut de Biologie de l'ENS (IBENS), CNRS, INSERM, Paris, France
| | - Auguste Genovesio
- Département de biologie, École normale supérieure, Institut de Biologie de l'ENS (IBENS), CNRS, INSERM, Paris, France
| | - Hugues Roest Crollius
- Département de biologie, École normale supérieure, Institut de Biologie de l'ENS (IBENS), CNRS, INSERM, Paris, France
| |
Collapse
|
4
|
Padariya M, Jooste ML, Hupp T, Fåhraeus R, Vojtesek B, Vollrath F, Kalathiya U, Karakostis K. The Elephant evolved p53 isoforms that escape mdm2-mediated repression and cancer. Mol Biol Evol 2022; 39:6632613. [PMID: 35792674 PMCID: PMC9279639 DOI: 10.1093/molbev/msac149] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The p53 tumor suppressor is a transcription factor with roles in cell development, apoptosis, oncogenesis, aging, and homeostasis in response to stresses and infections. p53 is tightly regulated by the MDM2 E3 ubiquitin ligase. The p53–MDM2 pathway has coevolved, with MDM2 remaining largely conserved, whereas the TP53 gene morphed into various isoforms. Studies on prevertebrate ancestral homologs revealed the transition from an environmentally induced mechanism activating p53 to a tightly regulated system involving cell signaling. The evolution of this mechanism depends on structural changes in the interacting protein motifs. Elephants such as Loxodonta africana constitute ideal models to investigate this coevolution as they are large and long-living as well as having 20 copies of TP53 isoformic sequences expressing a variety of BOX-I MDM2-binding motifs. Collectively, these isoforms would enhance sensitivity to cellular stresses, such as DNA damage, presumably accounting for strong cancer defenses and other adaptations favoring healthy aging. Here we investigate the molecular evolution of the p53–MDM2 system by combining in silico modeling and in vitro assays to explore structural and functional aspects of p53 isoforms retaining the MDM2 interaction, whereas forming distinct pools of cell signaling. The methodology used demonstrates, for the first time that in silico docking simulations can be used to explore functional aspects of elephant p53 isoforms. Our observations elucidate structural and mechanistic aspects of p53 regulation, facilitate understanding of complex cell signaling, and suggest testable hypotheses of p53 evolution referencing Peto’s Paradox.
Collapse
Affiliation(s)
- Monikaben Padariya
- International Centre for Cancer Vaccine Science, University of Gdansk , ul. Kładki 24, 80-822 Gdansk , Poland
| | - Mia-Lyn Jooste
- Institute of Genetics and Cancer, University of Edinburgh , Edinburgh EH4 2XR, UK
| | - Ted Hupp
- Institute of Genetics and Cancer, University of Edinburgh , Edinburgh EH4 2XR, UK
| | - Robin Fåhraeus
- International Centre for Cancer Vaccine Science, University of Gdansk , ul. Kładki 24, 80-822 Gdansk , Poland
- Inserm UMRS1131, Institut de Génétique Moléculaire , Université Paris 7, Hôpital St. Louis, F-75010 Paris , France
- Research Centre for Applied Molecular Oncology (RECAMO), Masaryk Memorial Cancer Institute , 65653 Brno , Czech Republic
- Department of Medical Biosciences, Umeå University , 90185 Umeå , Sweden
| | - Borek Vojtesek
- Research Centre for Applied Molecular Oncology (RECAMO), Masaryk Memorial Cancer Institute , 65653 Brno , Czech Republic
| | - Fritz Vollrath
- Department of Zoology, Zoology Research and Administration Building, University of Oxford , Oxford, UK
- Save the Elephants Marula Manor , Marula Lane, Karen P.O. Box 54667. Nairobi 00200. Kenya Office: +254 720 441 178
| | - Umesh Kalathiya
- International Centre for Cancer Vaccine Science, University of Gdansk , ul. Kładki 24, 80-822 Gdansk , Poland
| | - Konstantinos Karakostis
- Inserm UMRS1131, Institut de Génétique Moléculaire , Université Paris 7, Hôpital St. Louis, F-75010 Paris , France
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona , 08193 Bellaterra (Barcelona) , Spain
| |
Collapse
|
5
|
Möller L, Vainstein Y, Wöhlbrand L, Dörries M, Meyer B, Sohn K, Rabus R. Transcriptome-proteome compendium of the Antarctic krill (Euphausia superba): Metabolic potential and repertoire of hydrolytic enzymes. Proteomics 2022; 22:e2100404. [PMID: 35778945 DOI: 10.1002/pmic.202100404] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 06/24/2022] [Accepted: 06/28/2022] [Indexed: 11/06/2022]
Abstract
The Antarctic krill (Euphausia superba Dana) is a keystone species in the Southern Ocean that uses an arsenal of hydrolases for biomacromolecule decomposition to effectively digest its omnivorous diet. The present study builds on a hybrid-assembled transcriptome (13,671 ORFs) combined with comprehensive proteome profiling. The analysis of individual krill compartments allowed detection of significantly more different proteins compared to that of the entire animal (1,464 vs. 294 proteins). The nearby krill sampling stations in the Bransfield Strait (Antarctic Peninsula) yielded rather uniform proteome datasets. Proteins related to energy production and lipid degradation were particularly abundant in the abdomen, agreeing with the high energy demand of muscle tissue. A total of 378 different biomacromolecule hydrolysing enzymes were detected, including 250 proteases, 99 CAZymes, 14 nucleases and 15 lipases. The large repertoire in proteases is in accord with the protein-rich diet affiliated with E. superba's omnivorous lifestyle and complex biology. The richness in chitin-degrading enzymes allows not only digestion of zooplankton diet, but also the utilization of the discharged exoskeleton after moulting. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Lars Möller
- General and Molecular Microbiology, Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Yeheven Vainstein
- In-Vitro-Diagnostics, Fraunhofer Institute for Interfacial Engineering and Biotechnology (IGB), Stuttgart, Germany
| | - Lars Wöhlbrand
- General and Molecular Microbiology, Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Marvin Dörries
- General and Molecular Microbiology, Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky University of Oldenburg, Oldenburg, Germany.,Biodiversity Change, Helmholtz Institute for Functional Marine Biodiversity at the University of Oldenburg (HIFMB), Oldenburg, Germany
| | - Bettina Meyer
- Biodiversity and Biological Processes in Polar Oceans, Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky University of Oldenburg, Oldenburg, Germany.,Ecophysiology of Pelagic Key Species, Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany.,Biodiversity Change, Helmholtz Institute for Functional Marine Biodiversity at the University of Oldenburg (HIFMB), Oldenburg, Germany
| | - Kai Sohn
- In-Vitro-Diagnostics, Fraunhofer Institute for Interfacial Engineering and Biotechnology (IGB), Stuttgart, Germany
| | - Ralf Rabus
- General and Molecular Microbiology, Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| |
Collapse
|