1
|
Eme L, Tamarit D. Microbial Diversity and Open Questions about the Deep Tree of Life. Genome Biol Evol 2024; 16:evae053. [PMID: 38620144 PMCID: PMC11018274 DOI: 10.1093/gbe/evae053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/11/2024] [Indexed: 04/17/2024] Open
Abstract
In this perspective, we explore the transformative impact and inherent limitations of metagenomics and single-cell genomics on our understanding of microbial diversity and their integration into the Tree of Life. We delve into the key challenges associated with incorporating new microbial lineages into the Tree of Life through advanced phylogenomic approaches. Additionally, we shed light on enduring debates surrounding various aspects of the microbial Tree of Life, focusing on recent advances in some of its deepest nodes, such as the roots of bacteria, archaea, and eukaryotes. We also bring forth current limitations in genome recovery and phylogenomic methodology, as well as new avenues of research to uncover additional key microbial lineages and resolve the shape of the Tree of Life.
Collapse
Affiliation(s)
- Laura Eme
- Ecologie Systématique Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif sur-Yvette, France
| | - Daniel Tamarit
- Theoretical Biology and Bioinformatics, Utrecht University, Utrecht 3584CH, The Netherlands
| |
Collapse
|
2
|
Sennett MA, Theobald DL. Extant Sequence Reconstruction: The Accuracy of Ancestral Sequence Reconstructions Evaluated by Extant Sequence Cross-Validation. J Mol Evol 2024; 92:181-206. [PMID: 38502220 PMCID: PMC10978691 DOI: 10.1007/s00239-024-10162-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 02/20/2024] [Indexed: 03/21/2024]
Abstract
Ancestral sequence reconstruction (ASR) is a phylogenetic method widely used to analyze the properties of ancient biomolecules and to elucidate mechanisms of molecular evolution. Despite its increasingly widespread application, the accuracy of ASR is currently unknown, as it is generally impossible to compare resurrected proteins to the true ancestors. Which evolutionary models are best for ASR? How accurate are the resulting inferences? Here we answer these questions using a cross-validation method to reconstruct each extant sequence in an alignment with ASR methodology, a method we term "extant sequence reconstruction" (ESR). We thus can evaluate the accuracy of ASR methodology by comparing ESR reconstructions to the corresponding known true sequences. We find that a common measure of the quality of a reconstructed sequence, the average probability, is indeed a good estimate of the fraction of correct amino acids when the evolutionary model is accurate or overparameterized. However, the average probability is a poor measure for comparing reconstructions from different models, because, surprisingly, a more accurate phylogenetic model often results in reconstructions with lower probability. While better (more predictive) models may produce reconstructions with lower sequence identity to the true sequences, better models nevertheless produce reconstructions that are more biophysically similar to true ancestors. In addition, we find that a large fraction of sequences sampled from the reconstruction distribution may have fewer errors than the single most probable (SMP) sequence reconstruction, despite the fact that the SMP has the lowest expected error of all possible sequences. Our results emphasize the importance of model selection for ASR and the usefulness of sampling sequence reconstructions for analyzing ancestral protein properties. ESR is a powerful method for validating the evolutionary models used for ASR and can be applied in practice to any phylogenetic analysis of real biological sequences. Most significantly, ESR uses ASR methodology to provide a general method by which the biophysical properties of resurrected proteins can be compared to the properties of the true protein.
Collapse
Affiliation(s)
- Michael A Sennett
- Department of Biochemistry, Brandeis University, Waltham, MA, 02453, USA
| | - Douglas L Theobald
- Department of Biochemistry, Brandeis University, Waltham, MA, 02453, USA.
| |
Collapse
|
3
|
Ferreiro D, Branco C, Arenas M. Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
4
|
Ferreiro D, Khalil R, Sousa SF, Arenas M. Substitution Models of Protein Evolution with Selection on Enzymatic Activity. Mol Biol Evol 2024; 41:msae026. [PMID: 38314876 PMCID: PMC10873502 DOI: 10.1093/molbev/msae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 02/07/2024] Open
Abstract
Substitution models of evolution are necessary for diverse evolutionary analyses including phylogenetic tree and ancestral sequence reconstructions. At the protein level, empirical substitution models are traditionally used due to their simplicity, but they ignore the variability of substitution patterns among protein sites. Next, in order to improve the realism of the modeling of protein evolution, a series of structurally constrained substitution models were presented, but still they usually ignore constraints on the protein activity. Here, we present a substitution model of protein evolution with selection on both protein structure and enzymatic activity, and that can be applied to phylogenetics. In particular, the model considers the binding affinity of the enzyme-substrate complex as well as structural constraints that include the flexibility of structural flaps, hydrogen bonds, amino acids backbone radius of gyration, and solvent-accessible surface area that are quantified through molecular dynamics simulations. We applied the model to the HIV-1 protease and evaluated it by phylogenetic likelihood in comparison with the best-fitting empirical substitution model and a structurally constrained substitution model that ignores the enzymatic activity. We found that accounting for selection on the protein activity improves the fitting of the modeled functional regions with the real observations, especially in data with high molecular identity, which recommends considering constraints on the protein activity in the development of substitution models of evolution.
Collapse
Affiliation(s)
- David Ferreiro
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Ruqaiya Khalil
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| | - Sergio F Sousa
- UCIBIO/REQUIMTE, BioSIM, Departamento de Biomedicina, Faculdade de Medicina da Universidade do Porto, 4200-319 Porto, Portugal
| | - Miguel Arenas
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Department of Biochemistry, Genetics and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
5
|
Javed A, Habib S, Ayub A. Evolution of protein domain repertoires of CALHM6. PeerJ 2024; 12:e16063. [PMID: 38188152 PMCID: PMC10768655 DOI: 10.7717/peerj.16063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 08/18/2023] [Indexed: 01/09/2024] Open
Abstract
Calcium (Ca2 +) homeostasis is essential in conducting various cellular processes including nerve transmission, muscular movement, and immune response. Changes in Ca2 + concentration in the cytoplasm are significant in bringing about various immune responses such as pathogen clearance and apoptosis. Various key players are involved in calcium homeostasis such as calcium binders, pumps, and channels. Sequence-based evolutionary information has recently been exploited to predict the biophysical behaviors of proteins, giving critical clues about their functionality. Ion channels are reportedly the first channels developed during evolution. Calcium homeostasis modulator protein 6 (CALHM6) is one such channel. Comprised of a single domain called Ca_hom_mod, CALHM6 is a stable protein interacting with various other proteins in calcium regulation. No previous attempt has been made to trace the exact evolutionary events in the domain of CALHM6, leaving plenty of room for exploring its evolution across a wide range of organisms. The current study aims to answer the questions by employing a computational-based strategy that used profile Hidden Markov Models (HMMs) to scan for the CALHM6 domain, integrated the data with a time-calibrated phylogenetic tree using BEAST and Mesquite, and visualized through iTOL. Around 4,000 domains were identified, and 14,000 domain gain, loss, and duplication events were observed at the end which also included various protein domains other than CALHM6. The data were analyzed concerning CALHM6 evolution as well as the domain gain, loss, and duplication of its interacting partners: Calpain, Vinculin, protein S100-A7, Thioredoxin, Peroxiredoxin, and Calmodulin-like protein 5. Duplication events of CALHM6 near higher eukaryotes showed its increasing complexity in structure and function. This in-silico phylogenetic approach applied to trace the evolution of CALHM6 was an effective approach to get a better understanding of the protein CALHM6.
Collapse
Affiliation(s)
- Aneela Javed
- Molecular Immunology Laboratory, Department of Healthcare Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Sabahat Habib
- Molecular Immunology Laboratory, Department of Healthcare Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Aaima Ayub
- Molecular Immunology Laboratory, Department of Healthcare Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| |
Collapse
|
6
|
Aledo P, Aledo JC. Proteome-Wide Structural Computations Provide Insights into Empirical Amino Acid Substitution Matrices. Int J Mol Sci 2023; 24:ijms24010796. [PMID: 36614247 PMCID: PMC9821064 DOI: 10.3390/ijms24010796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 12/24/2022] [Accepted: 12/29/2022] [Indexed: 01/04/2023] Open
Abstract
The relative contribution of mutation and selection to the amino acid substitution rates observed in empirical matrices is unclear. Herein, we present a neutral continuous fitness-stability model, inspired by the Arrhenius law (qij=aije-ΔΔGij). The model postulates that the rate of amino acid substitution (i→j) is determined by the product of a pre-exponential factor, which is influenced by the genetic code structure, and an exponential term reflecting the relative fitness of the amino acid substitutions. To assess the validity of our model, we computed changes in stability of 14,094 proteins, for which 137,073,638 in silico mutants were analyzed. These site-specific data were summarized into a 20 square matrix, whose entries, ΔΔGij, were obtained after averaging through all the sites in all the proteins. We found a significant positive correlation between these energy values and the disease-causing potential of each substitution, suggesting that the exponential term accurately summarizes the fitness effect. A remarkable observation was that amino acids that were highly destabilizing when acting as the source, tended to have little effect when acting as the destination, and vice versa (source → destination). The Arrhenius model accurately reproduced the pattern of substitution rates collected in the empirical matrices, suggesting a relevant role for the genetic code structure and a tuning role for purifying selection exerted via protein stability.
Collapse
|
7
|
Ayuso-Fernández I, Molpeceres G, Camarero S, Ruiz-Dueñas FJ, Martínez AT. Ancestral sequence reconstruction as a tool to study the evolution of wood decaying fungi. FRONTIERS IN FUNGAL BIOLOGY 2022; 3:1003489. [PMID: 37746217 PMCID: PMC10512382 DOI: 10.3389/ffunb.2022.1003489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/22/2022] [Indexed: 09/26/2023]
Abstract
The study of evolution is limited by the techniques available to do so. Aside from the use of the fossil record, molecular phylogenetics can provide a detailed characterization of evolutionary histories using genes, genomes and proteins. However, these tools provide scarce biochemical information of the organisms and systems of interest and are therefore very limited when they come to explain protein evolution. In the past decade, this limitation has been overcome by the development of ancestral sequence reconstruction (ASR) methods. ASR allows the subsequent resurrection in the laboratory of inferred proteins from now extinct organisms, becoming an outstanding tool to study enzyme evolution. Here we review the recent advances in ASR methods and their application to study fungal evolution, with special focus on wood-decay fungi as essential organisms in the global carbon cycling.
Collapse
Affiliation(s)
- Iván Ayuso-Fernández
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Gonzalo Molpeceres
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| | - Susana Camarero
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| | | | - Angel T. Martínez
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| |
Collapse
|