1
|
Vakirlis N, Acar O, Cherupally V, Carvunis AR. Ancestral Sequence Reconstruction as a Tool to Detect and Study De Novo Gene Emergence. Genome Biol Evol 2024; 16:evae151. [PMID: 39004885 PMCID: PMC11299112 DOI: 10.1093/gbe/evae151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 06/17/2024] [Accepted: 07/09/2024] [Indexed: 07/16/2024] Open
Abstract
New protein-coding genes can evolve from previously noncoding genomic regions through a process known as de novo gene emergence. Evidence suggests that this process has likely occurred throughout evolution and across the tree of life. Yet, confidently identifying de novo emerged genes remains challenging. Ancestral sequence reconstruction is a promising approach for inferring whether a gene has emerged de novo or not, as it allows us to inspect whether a given genomic locus ancestrally harbored protein-coding capacity. However, the use of ancestral sequence reconstruction in the context of de novo emergence is still in its infancy and its capabilities, limitations, and overall potential are largely unknown. Notably, it is difficult to formally evaluate the protein-coding capacity of ancestral sequences, particularly when new gene candidates are short. How well-suited is ancestral sequence reconstruction as a tool for the detection and study of de novo genes? Here, we address this question by designing an ancestral sequence reconstruction workflow incorporating different tools and sets of parameters and by introducing a formal criterion that allows to estimate, within a desired level of confidence, when protein-coding capacity originated at a particular locus. Applying this workflow on ∼2,600 short, annotated budding yeast genes (<1,000 nucleotides), we found that ancestral sequence reconstruction robustly predicts an ancient origin for the most widely conserved genes, which constitute "easy" cases. For less robust cases, we calculated a randomization-based empirical P-value estimating whether the observed conservation between the extant and ancestral reading frame could be attributed to chance. This formal criterion allowed us to pinpoint a branch of origin for most of the less robust cases, identifying 49 genes that can unequivocally be considered de novo originated since the split of the Saccharomyces genus, including 37 Saccharomyces cerevisiae-specific genes. We find that for the remaining equivocal cases we cannot rule out different evolutionary scenarios including rapid evolution, multiple gene losses, or a recent de novo origin. Overall, our findings suggest that ancestral sequence reconstruction is a valuable tool to study de novo gene emergence but should be applied with caution and awareness of its limitations.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Omer Acar
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Vijay Cherupally
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Pittsburgh Center for Evolutionary Biology and Medicine, Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
2
|
Dereli O, Kuru N, Akkoyun E, Bircan A, Tastan O, Adebali O. PHACTboost: A Phylogeny-Aware Pathogenicity Predictor for Missense Mutations via Boosting. Mol Biol Evol 2024; 41:msae136. [PMID: 38934805 PMCID: PMC11251492 DOI: 10.1093/molbev/msae136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 05/30/2024] [Accepted: 06/24/2024] [Indexed: 06/28/2024] Open
Abstract
Most algorithms that are used to predict the effects of variants rely on evolutionary conservation. However, a majority of such techniques compute evolutionary conservation by solely using the alignment of multiple sequences while overlooking the evolutionary context of substitution events. We had introduced PHACT, a scoring-based pathogenicity predictor for missense mutations that can leverage phylogenetic trees, in our previous study. By building on this foundation, we now propose PHACTboost, a gradient boosting tree-based classifier that combines PHACT scores with information from multiple sequence alignments, phylogenetic trees, and ancestral reconstruction. By learning from data, PHACTboost outperforms PHACT. Furthermore, the results of comprehensive experiments on carefully constructed sets of variants demonstrated that PHACTboost can outperform 40 prevalent pathogenicity predictors reported in the dbNSFP, including conventional tools, metapredictors, and deep learning-based approaches as well as more recent tools such as AlphaMissense, EVE, and CPT-1. The superiority of PHACTboost over these methods was particularly evident in case of hard variants for which different pathogenicity predictors offered conflicting results. We provide predictions of 215 million amino acid alterations over 20,191 proteins. PHACTboost is available at https://github.com/CompGenomeLab/PHACTboost. PHACTboost can improve our understanding of genetic diseases and facilitate more accurate diagnoses.
Collapse
Affiliation(s)
- Onur Dereli
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Nurdan Kuru
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Emrah Akkoyun
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
- Network Technologies Department, TÜBİTAK-ULAKBİM Turkish Academic Network and Information Center, Ankara 06530, Turkey
| | - Aylin Bircan
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Oznur Tastan
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
| | - Ogün Adebali
- Faculty of Engineering and Natural Sciences, Sabanci University, Istanbul 34956, Turkey
- Biological Sciences, TÜBİTAK Research Institute for Fundamental Sciences, Gebze 41470, Turkey
| |
Collapse
|
3
|
Iglhaut C, Pečerska J, Gil M, Anisimova M. Please Mind the Gap: Indel-Aware Parsimony for Fast and Accurate Ancestral Sequence Reconstruction and Multiple Sequence Alignment Including Long Indels. Mol Biol Evol 2024; 41:msae109. [PMID: 38842253 PMCID: PMC11221656 DOI: 10.1093/molbev/msae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/30/2024] [Accepted: 06/03/2024] [Indexed: 06/07/2024] Open
Abstract
Despite having important biological implications, insertion, and deletion (indel) events are often disregarded or mishandled during phylogenetic inference. In multiple sequence alignment, indels are represented as gaps and are estimated without considering the distinct evolutionary history of insertions and deletions. Consequently, indels are usually excluded from subsequent inference steps, such as ancestral sequence reconstruction and phylogenetic tree search. Here, we introduce indel-aware parsimony (indelMaP), a novel way to treat gaps under the parsimony criterion by considering insertions and deletions as separate evolutionary events and accounting for long indels. By identifying the precise location of an evolutionary event on the tree, we can separate overlapping indel events and use affine gap penalties for long indel modeling. Our indel-aware approach harnesses the phylogenetic signal from indels, including them into all inference stages. Validation and comparison to state-of-the-art inference tools on simulated data show that indelMaP is most suitable for densely sampled datasets with closely to moderately related sequences, where it can reach alignment quality comparable to probabilistic methods and accurately infer ancestral sequences, including indel patterns. Due to its remarkable speed, our method is well suited for epidemiological datasets, eliminating the need for downsampling and enabling the exploitation of the additional information provided by dense taxonomic sampling. Moreover, indelMaP offers new insights into the indel patterns of biologically significant sequences and advances our understanding of genetic variability by considering gaps as crucial evolutionary signals rather than mere artefacts.
Collapse
Affiliation(s)
- Clara Iglhaut
- Institute of Computational Life Science, Zurich University of Applied Science, Wädenswil, Switzerland
- Faculty of Mathematics and Science, University of Zurich, Zürich, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Jūlija Pečerska
- Institute of Computational Life Science, Zurich University of Applied Science, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Manuel Gil
- Institute of Computational Life Science, Zurich University of Applied Science, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Maria Anisimova
- Institute of Computational Life Science, Zurich University of Applied Science, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
4
|
Dorji J, Reverter A, Alexandre PA, Chamberlain AJ, Vander-Jagt CJ, Kijas J, Porto-Neto LR. Ancestral alleles defined for 70 million cattle variants using a population-based likelihood ratio test. Genet Sel Evol 2024; 56:11. [PMID: 38321371 PMCID: PMC10848479 DOI: 10.1186/s12711-024-00879-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 01/30/2024] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND The study of ancestral alleles provides insights into the evolutionary history, selection, and genetic structures of a population. In cattle, ancestral alleles are widely used in genetic analyses, including the detection of signatures of selection, determination of breed ancestry, and identification of admixture. Having a comprehensive list of ancestral alleles is expected to improve the accuracy of these genetic analyses. However, the list of ancestral alleles in cattle, especially at the whole genome sequence level, is far from complete. In fact, the current largest list of ancestral alleles (~ 42 million) represents less than 28% of the total number of detected variants in cattle. To address this issue and develop a genomic resource for evolutionary studies, we determined ancestral alleles in cattle by comparing prior derived whole-genome sequence variants to an out-species group using a population-based likelihood ratio test. RESULTS Our study determined and makes available the largest list of ancestral alleles in cattle to date (70.1 million) and includes 2.3 million on the X chromosome. There was high concordance (97.6%) of the determined ancestral alleles with those from previous studies when only high-probability ancestral alleles were considered (29.8 million positions) and another 23.5 million high-confidence ancestral alleles were novel, expanding the available reference list to improve the accuracies of genetic analyses involving ancestral alleles. The high concordance of the results with previous studies implies that our approach using genomic sequence variants and a likelihood ratio test to determine ancestral alleles is appropriate. CONCLUSIONS Considering the high concordance of ancestral alleles across studies, the ancestral alleles determined in this study including those not previously listed, particularly those with high-probability estimates, may be used for further genetic analyses with reasonable accuracy. Our approach that used predetermined variants in species and the likelihood ratio test to determine ancestral alleles is applicable to other species for which sequence level genotypes are available.
Collapse
Affiliation(s)
- Jigme Dorji
- CSIRO, Agriculture & Food, St. Lucia, QLD, 4067, Australia.
| | | | | | - Amanda J Chamberlain
- AgriBio, Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia
| | - Christy J Vander-Jagt
- AgriBio, Centre for AgriBioscience, Agriculture Victoria, Bundoora, VIC, 3083, Australia
| | - James Kijas
- CSIRO, Agriculture & Food, St. Lucia, QLD, 4067, Australia
| | | |
Collapse
|
5
|
Wygoda E, Loewenthal G, Moshe A, Alburquerque M, Mayrose I, Pupko T. Statistical framework to determine indel-length distribution. Bioinformatics 2024; 40:btae043. [PMID: 38269647 PMCID: PMC10868340 DOI: 10.1093/bioinformatics/btae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 01/10/2024] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open
Abstract
MOTIVATION Insertions and deletions (indels) of short DNA segments, along with substitutions, are the most frequent molecular evolutionary events. Indels were shown to affect numerous macro-evolutionary processes. Because indels may span multiple positions, their impact is a product of both their rate and their length distribution. An accurate inference of indel-length distribution is important for multiple evolutionary and bioinformatics applications, most notably for alignment software. Previous studies counted the number of continuous gap characters in alignments to determine the best-fitting length distribution. However, gap-counting methods are not statistically rigorous, as gap blocks are not synonymous with indels. Furthermore, such methods rely on alignments that regularly contain errors and are biased due to the assumption of alignment methods that indels lengths follow a geometric distribution. RESULTS We aimed to determine which indel-length distribution best characterizes alignments using statistical rigorous methodologies. To this end, we reduced the alignment bias using a machine-learning algorithm and applied an Approximate Bayesian Computation methodology for model selection. Moreover, we developed a novel method to test if current indel models provide an adequate representation of the evolutionary process. We found that the best-fitting model varies among alignments, with a Zipf length distribution fitting the vast majority of them. AVAILABILITY AND IMPLEMENTATION The data underlying this article are available in Github, at https://github.com/elyawy/SpartaSim and https://github.com/elyawy/SpartaPipeline.
Collapse
Affiliation(s)
- Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Michael Alburquerque
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
6
|
Lemieux P, Bradley D, Dubé AK, Dionne U, Landry CR. Dissection of the role of a Src homology 3 domain in the evolution of binding preference of paralogous proteins. Genetics 2024; 226:iyad175. [PMID: 37793087 PMCID: PMC10763533 DOI: 10.1093/genetics/iyad175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 07/07/2023] [Accepted: 08/07/2023] [Indexed: 10/06/2023] Open
Abstract
Protein-protein interactions (PPIs) drive many cellular processes. Some interactions are directed by Src homology 3 (SH3) domains that bind proline-rich motifs on other proteins. The evolution of the binding specificity of SH3 domains is not completely understood, particularly following gene duplication. Paralogous genes accumulate mutations that can modify protein functions and, for SH3 domains, their binding preferences. Here, we examined how the binding of the SH3 domains of 2 paralogous yeast type I myosins, Myo3 and Myo5, evolved following duplication. We found that the paralogs have subtly different SH3-dependent interaction profiles. However, by swapping SH3 domains between the paralogs and characterizing the SH3 domains freed from their protein context, we find that very few of the differences in interactions, if any, depend on the SH3 domains themselves. We used ancestral sequence reconstruction to resurrect the preduplication SH3 domains and examined, moving back in time, how the binding preference changed. Although the most recent ancestor of the 2 domains had a very similar binding preference as the extant ones, older ancestral domains displayed a gradual loss of interaction with the modern interaction partners when inserted in the extant paralogs. Molecular docking and experimental characterization of the free ancestral domains showed that their affinity with the proline motifs is likely not the cause for this loss of binding. Taken together, our results suggest that a SH3 and its host protein could create intramolecular or allosteric interactions essential for the SH3-dependent PPIs, making domains not functionally equivalent even when they have the same binding specificity.
Collapse
Affiliation(s)
- Pascale Lemieux
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, 1030, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Regroupement Québécois de Recherche sur la Fonction, l’Ingénierie et les Applications des Protéines, (PROTEO), Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Centre de recherche en données massives (CRDM), Université Laval, 1065, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biochimie, microbiologie et bio-informatique, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
| | - David Bradley
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, 1030, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Regroupement Québécois de Recherche sur la Fonction, l’Ingénierie et les Applications des Protéines, (PROTEO), Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Centre de recherche en données massives (CRDM), Université Laval, 1065, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biochimie, microbiologie et bio-informatique, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biologie, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, 1030, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Regroupement Québécois de Recherche sur la Fonction, l’Ingénierie et les Applications des Protéines, (PROTEO), Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Centre de recherche en données massives (CRDM), Université Laval, 1065, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biochimie, microbiologie et bio-informatique, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biologie, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
| | - Ugo Dionne
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, 1030, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Regroupement Québécois de Recherche sur la Fonction, l’Ingénierie et les Applications des Protéines, (PROTEO), Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Centre de Recherche du Centre Hospitalier Universitaire (CHU) de Québec, Université Laval, Québec, QC, Canada G1R 2J6
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada M5G 1X5
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, 1030, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Regroupement Québécois de Recherche sur la Fonction, l’Ingénierie et les Applications des Protéines, (PROTEO), Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Centre de recherche en données massives (CRDM), Université Laval, 1065, Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biochimie, microbiologie et bio-informatique, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
- Département de biologie, Université Laval, 1045 Avenue de la Médecine, Québec, QC, Canada G1V 0A6
| |
Collapse
|
7
|
Mönttinen HAM, Frilander MJ, Löytynoja A. Generation of de novo miRNAs from template switching during DNA replication. Proc Natl Acad Sci U S A 2023; 120:e2310752120. [PMID: 38019864 PMCID: PMC10710096 DOI: 10.1073/pnas.2310752120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/01/2023] [Indexed: 12/01/2023] Open
Abstract
The mechanisms generating novel genes and genetic information are poorly known, even for microRNA (miRNA) genes with an extremely constrained design. All miRNA primary transcripts need to fold into a stem-loop structure to yield short gene products ([Formula: see text]22 nt) that bind and repress their mRNA targets. While a substantial number of miRNA genes are ancient and highly conserved, short secondary structures coding for entirely novel miRNA genes have been shown to emerge in a lineage-specific manner. Template switching is a DNA-replication-related mutation mechanism that can introduce complex changes and generate perfect base pairing for entire hairpin structures in a single event. Here, we show that the template-switching mutations (TSMs) have participated in the emergence of over 6,000 suitable hairpin structures in the primate lineage to yield at least 18 new human miRNA genes, that is 26% of the miRNAs inferred to have arisen since the origin of primates. While the mechanism appears random, the TSM-generated miRNAs are enriched in introns where they can be expressed with their host genes. The high frequency of TSM events provides raw material for evolution. Being orders of magnitude faster than other mechanisms proposed for de novo creation of genes, TSM-generated miRNAs enable near-instant rewiring of genetic information and rapid adaptation to changing environments.
Collapse
Affiliation(s)
- Heli A. M. Mönttinen
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| | - Mikko J. Frilander
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| | - Ari Löytynoja
- Institute of Biotechnology, Helsinki Institute of Life Science, University of Helsinki, HelsinkiFI-000, Finland
| |
Collapse
|
8
|
Hansen MH, Adamek M, Iftime D, Petras D, Schuseil F, Grond S, Stegmann E, Cryle MJ, Ziemert N. Resurrecting ancestral antibiotics: unveiling the origins of modern lipid II targeting glycopeptides. Nat Commun 2023; 14:7842. [PMID: 38030603 PMCID: PMC10687080 DOI: 10.1038/s41467-023-43451-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 11/09/2023] [Indexed: 12/01/2023] Open
Abstract
Antibiotics are central to modern medicine, and yet they are mainly the products of intra and inter-kingdom evolutionary warfare. To understand how nature evolves antibiotics around a common mechanism of action, we investigated the origins of an extremely valuable class of compounds, lipid II targeting glycopeptide antibiotics (GPAs, exemplified by teicoplanin and vancomycin), which are used as last resort for the treatment of antibiotic resistant bacterial infections. Using a molecule-centred approach and computational techniques, we first predicted the nonribosomal peptide synthetase assembly line of paleomycin, the ancestral parent of lipid II targeting GPAs. Subsequently, we employed synthetic biology techniques to produce the predicted peptide and validated its antibiotic activity. We revealed the structure of paleomycin, which enabled us to address how nature morphs a peptide antibiotic scaffold through evolution. In doing so, we obtained temporal snapshots of key selection domains in nonribosomal peptide synthesis during the biosynthetic journey from ancestral, teicoplanin-like GPAs to modern GPAs such as vancomycin. Our study demonstrates the synergy of computational techniques and synthetic biology approaches enabling us to journey back in time, trace the temporal evolution of antibiotics, and revive these ancestral molecules. It also reveals the optimisation strategies nature has applied to evolve modern GPAs, laying the foundation for future efforts to engineer this important class of antimicrobial agents.
Collapse
Affiliation(s)
- Mathias H Hansen
- Department of Biochemistry and Molecular Biology, The Monash Biomedicine Discovery Institute, Monash University, Clayton, VIC, 3800, Australia
- EMBL Australia, Monash University, Clayton, VIC, 3800, Australia
- ARC Centre of Excellence for Innovations in Peptide and Protein Science, Monash University, Clayton, VIC, 3800, Australia
| | - Martina Adamek
- Interfaculty Institute of Microbiology and Infection Medicine Tübingen, Cluster of Excellence 'Controlling Microbes to Fight Infections', University of Tübingen, Tübingen, Germany
- German Centre for Infection Research (DZIF), Partner Site Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
| | - Dumitrita Iftime
- Interfaculty Institute of Microbiology and Infection Medicine Tübingen, Cluster of Excellence 'Controlling Microbes to Fight Infections', University of Tübingen, Tübingen, Germany
| | - Daniel Petras
- Interfaculty Institute of Microbiology and Infection Medicine Tübingen, Cluster of Excellence 'Controlling Microbes to Fight Infections', University of Tübingen, Tübingen, Germany
| | - Frauke Schuseil
- Interfaculty Institute of Microbiology and Infection Medicine Tübingen, Cluster of Excellence 'Controlling Microbes to Fight Infections', University of Tübingen, Tübingen, Germany
- German Centre for Infection Research (DZIF), Partner Site Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
| | - Stephanie Grond
- Institute of Organic Chemistry, University of Tübingen, Tübingen, Germany
| | - Evi Stegmann
- Interfaculty Institute of Microbiology and Infection Medicine Tübingen, Cluster of Excellence 'Controlling Microbes to Fight Infections', University of Tübingen, Tübingen, Germany.
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany.
| | - Max J Cryle
- Department of Biochemistry and Molecular Biology, The Monash Biomedicine Discovery Institute, Monash University, Clayton, VIC, 3800, Australia.
- EMBL Australia, Monash University, Clayton, VIC, 3800, Australia.
- ARC Centre of Excellence for Innovations in Peptide and Protein Science, Monash University, Clayton, VIC, 3800, Australia.
| | - Nadine Ziemert
- Interfaculty Institute of Microbiology and Infection Medicine Tübingen, Cluster of Excellence 'Controlling Microbes to Fight Infections', University of Tübingen, Tübingen, Germany.
- German Centre for Infection Research (DZIF), Partner Site Tübingen, Tübingen, Germany.
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany.
| |
Collapse
|
9
|
de Oliveira SG, Kotowski N, Sampaio-Filho HR, Aguiar FHB, Dávila AMR, Jardim R. Metalloproteinases in Restorative Dentistry: An In Silico Study toward an Ideal Animal Model. Biomedicines 2023; 11:3042. [PMID: 38002041 PMCID: PMC10669239 DOI: 10.3390/biomedicines11113042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 09/02/2023] [Accepted: 09/13/2023] [Indexed: 11/26/2023] Open
Abstract
In dentistry, various animal models are used to evaluate adhesive systems, dental caries and periodontal diseases. Metalloproteinases (MMPs) are enzymes that degrade collagen in the dentin matrix and are categorized in over 20 different classes. Collagenases and gelatinases are intrinsic constituents of the human dentin organic matrix fibrillar network and are the most abundant MMPs in this tissue. Understanding such enzymes' action on dentin is important in the development of approaches that could reduce dentin degradation and provide restorative procedures with extended longevity. This in silico study is based on dentistry's most used animal models and intends to search for the most suitable, evolutionarily close to Homo sapiens. We were able to retrieve 176,077 mammalian MMP sequences from the UniProt database. These sequences were manually curated through a three-step process. After such, the remaining 3178 sequences were aligned in a multifasta file and phylogenetically reconstructed using the maximum likelihood method. Our study inferred that the animal models most evolutionarily related to Homo sapiens were Orcytolagus cuniculus (MMP-1 and MMP-8), Canis lupus (MMP-13), Rattus norvegicus (MMP-2) and Orcytolagus cuniculus (MMP-9). Further research will be needed for the biological validation of our findings.
Collapse
Affiliation(s)
- Simone Gomes de Oliveira
- Piracicaba School of Dentistry, Campinas State University, Piracicaba 13414-903, SP, Brazil
- School of Dentistry, State University of Rio de Janeiro, Rio de Janeiro 20551-030, RJ, Brazil
| | - Nelson Kotowski
- Computational and Systems Biology Laboratory, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro 21040-900, RJ, Brazil; (N.K.); (A.M.R.D.)
| | | | | | - Alberto Martín Rivera Dávila
- Computational and Systems Biology Laboratory, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro 21040-900, RJ, Brazil; (N.K.); (A.M.R.D.)
| | - Rodrigo Jardim
- Computational and Systems Biology Laboratory, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro 21040-900, RJ, Brazil; (N.K.); (A.M.R.D.)
| |
Collapse
|
10
|
Lytras S, Wickenhagen A, Sugrue E, Stewart DG, Swingler S, Sims A, Jackson Ireland H, Davies EL, Ludlam EM, Li Z, Hughes J, Wilson SJ. Resurrection of 2'-5'-oligoadenylate synthetase 1 (OAS1) from the ancestor of modern horseshoe bats blocks SARS-CoV-2 replication. PLoS Biol 2023; 21:e3002398. [PMID: 38015855 PMCID: PMC10683996 DOI: 10.1371/journal.pbio.3002398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 10/20/2023] [Indexed: 11/30/2023] Open
Abstract
The prenylated form of the human 2'-5'-oligoadenylate synthetase 1 (OAS1) protein has been shown to potently inhibit the replication of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the virus responsible for the Coronavirus Disease 2019 (COVID-19) pandemic. However, the OAS1 orthologue in the horseshoe bats (superfamily Rhinolophoidea), the reservoir host of SARS-related coronaviruses (SARSr-CoVs), has lost the prenylation signal required for this antiviral activity. Herein, we used an ancestral state reconstruction approach to predict and reconstitute in vitro, the most likely OAS1 protein sequence expressed by the Rhinolophoidea common ancestor prior to its prenylation loss (RhinoCA OAS1). We exogenously expressed the ancient bat protein in vitro to show that, unlike its non-prenylated horseshoe bat descendants, RhinoCA OAS1 successfully blocks SARS-CoV-2 replication. Using protein structure predictions in combination with evolutionary hypothesis testing methods, we highlight sites under unique diversifying selection specific to OAS1's evolution in the Rhinolophoidea. These sites are located near the RNA-binding region and the C-terminal end of the protein where the prenylation signal would have been. Our results confirm that OAS1 prenylation loss at the base of the Rhinolophoidea clade ablated the ability of OAS1 to restrict SARSr-CoV replication and that subsequent evolution of the gene in these bats likely favoured an alternative function. These findings can advance our understanding of the tightly linked association between SARSr-CoVs and horseshoe bats.
Collapse
Affiliation(s)
- Spyros Lytras
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Arthur Wickenhagen
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Elena Sugrue
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Douglas G. Stewart
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Simon Swingler
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Anna Sims
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Hollie Jackson Ireland
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Emma L. Davies
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Eliza M. Ludlam
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Zhuonan Li
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Joseph Hughes
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
| | - Sam J. Wilson
- MRC–University of Glasgow Centre for Virus Research, University of Glasgow, Glasgow, United Kingdom
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, Department of Medicine, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
11
|
Liu Z, Zhao Y, Zhang Y, Xu L, Zhou L, Yang W, Zhao H, Zhao J, Wang F. Development of Omni InDel and supporting database for maize. FRONTIERS IN PLANT SCIENCE 2023; 14:1216505. [PMID: 37457340 PMCID: PMC10344896 DOI: 10.3389/fpls.2023.1216505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 06/12/2023] [Indexed: 07/18/2023]
Abstract
Insertions-deletions (InDels) are the second most abundant molecular marker in the genome and have been widely used in molecular biology research along with simple sequence repeats (SSR) and single-nucleotide polymorphisms (SNP). However, InDel variant mining and marker development usually focuses on a single type of dimorphic InDel, which does not reflect the overall InDel diversity across the genome. Here, we developed Omni InDels for maize, soybean, and rice based on sequencing data and genome assembly that included InDel variants with base lengths from 1 bp to several Mb, and we conducted a detailed classification of Omni InDels. Moreover, we screened a set of InDels that are easily detected and typed (Perfect InDels) from the Omni InDels, verified the site authenticity using 3,587 germplasm resources from 11 groups, and analyzed the germplasm resources. Furthermore, we developed a Multi-InDel set based on the Omni InDels; each Multi-InDel contains multiple InDels, which greatly increases site polymorphism, they can be detected in multiple platforms such as fluorescent capillary electrophoresis and sequencing. Finally, we developed an online database website to make Omni InDels easy to use and share and developed a visual browsing function called "Variant viewer" for all Omni InDel sites to better display the variant distribution.
Collapse
Affiliation(s)
- Zhihao Liu
- Key Laboratory of Crop DNA Fingerprinting Innovation and Utilization (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Beijing Academy of Agricultural and Forest Sciences (BAAFS), Beijing, China
- College of Agriculture, Jilin Agricultural University, Changchun, China
| | - Yikun Zhao
- Key Laboratory of Crop DNA Fingerprinting Innovation and Utilization (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Beijing Academy of Agricultural and Forest Sciences (BAAFS), Beijing, China
| | - Yunlong Zhang
- Key Laboratory of Crop DNA Fingerprinting Innovation and Utilization (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Beijing Academy of Agricultural and Forest Sciences (BAAFS), Beijing, China
| | - Liwen Xu
- Key Laboratory of Crop DNA Fingerprinting Innovation and Utilization (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Beijing Academy of Agricultural and Forest Sciences (BAAFS), Beijing, China
| | - Ling Zhou
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu, China
| | - Weiguang Yang
- College of Agriculture, Jilin Agricultural University, Changchun, China
| | - Han Zhao
- Provincial Key Laboratory of Agrobiology, Institute of Crop Germplasm and Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing, Jiangsu, China
| | - Jiuran Zhao
- Key Laboratory of Crop DNA Fingerprinting Innovation and Utilization (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Beijing Academy of Agricultural and Forest Sciences (BAAFS), Beijing, China
| | - Fengge Wang
- Key Laboratory of Crop DNA Fingerprinting Innovation and Utilization (Co-construction by Ministry and Province), Ministry of Agriculture and Rural Affairs, Beijing Academy of Agricultural and Forest Sciences (BAAFS), Beijing, China
| |
Collapse
|
12
|
Orlandi KN, Phillips SR, Sailer ZR, Harman JL, Harms MJ. Topiary: Pruning the manual labor from ancestral sequence reconstruction. Protein Sci 2023; 32:e4551. [PMID: 36565302 PMCID: PMC9847077 DOI: 10.1002/pro.4551] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 12/14/2022] [Accepted: 12/17/2022] [Indexed: 12/25/2022]
Abstract
Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships among protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introduce topiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: (1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; (2) Does taxonomically informed sequence quality control and redundancy reduction; (3) Constructs a multiple sequence alignment; (4) Generates a maximum-likelihood gene tree; (5) Reconciles the gene tree to the species tree; (6) Reconstructs ancestral amino acid sequences; and (7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML-NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non-expert readers to the steps required for ASR, describe the specific design choices made in topiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download: https://github.com/harmslab/topiary.
Collapse
Affiliation(s)
- Kona N. Orlandi
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of BiologyUniversity of OregonEugeneOregonUSA
| | - Sophia R. Phillips
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| | - Zachary R. Sailer
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| | - Joseph L. Harman
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| | - Michael J. Harms
- Institute of Molecular BiologyUniversity of OregonEugeneOregonUSA
- Department of Chemistry and BiochemistryUniversity of OregonEugeneOregonUSA
| |
Collapse
|
13
|
Clifton BE, Kozome D, Laurino P. Efficient Exploration of Sequence Space by Sequence-Guided Protein Engineering and Design. Biochemistry 2023; 62:210-220. [PMID: 35245020 DOI: 10.1021/acs.biochem.1c00757] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The rapid growth of sequence databases over the past two decades means that protein engineers faced with optimizing a protein for any given task will often have immediate access to a vast number of related protein sequences. These sequences encode information about the evolutionary history of the protein and the underlying sequence requirements to produce folded, stable, and functional protein variants. Methods that can take advantage of this information are an increasingly important part of the protein engineering tool kit. In this Perspective, we discuss the utility of sequence data in protein engineering and design, focusing on recent advances in three main areas: the use of ancestral sequence reconstruction as an engineering tool to generate thermostable and multifunctional proteins, the use of sequence data to guide engineering of multipoint mutants by structure-based computational protein design, and the use of unlabeled sequence data for unsupervised and semisupervised machine learning, allowing the generation of diverse and functional protein sequences in unexplored regions of sequence space. Altogether, these methods enable the rapid exploration of sequence space within regions enriched with functional proteins and therefore have great potential for accelerating the engineering of stable, functional, and diverse proteins for industrial and biomedical applications.
Collapse
Affiliation(s)
- Ben E Clifton
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan
| | - Dan Kozome
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan
| | - Paola Laurino
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna, Okinawa 904-0495, Japan
| |
Collapse
|
14
|
Ghorbani A, Khataeipour SJ, Solbakken MH, Huebert DNG, Khoddami M, Eslamloo K, Collins C, Hori T, Jentoft S, Rise ML, Larijani M. Ancestral reconstruction reveals catalytic inactivation of activation-induced cytidine deaminase concomitant with cold water adaption in the Gadiformes bony fish. BMC Biol 2022; 20:293. [PMID: 36575514 PMCID: PMC9795746 DOI: 10.1186/s12915-022-01489-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 11/30/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Antibody affinity maturation in vertebrates requires the enzyme activation-induced cytidine deaminase (AID) which initiates secondary antibody diversification by mutating the immunoglobulin loci. AID-driven antibody diversification is conserved across jawed vertebrates since bony and cartilaginous fish. Two exceptions have recently been reported, the Pipefish and Anglerfish, in which the AID-encoding aicda gene has been lost. Both cases are associated with unusual reproductive behavior, including male pregnancy and sexual parasitism. Several cold water fish in the Atlantic cod (Gadinae) family carry an aicda gene that encodes for a full-length enzyme but lack affinity-matured antibodies and rely on antibodies of broad antigenic specificity. Hence, we examined the functionality of their AID. RESULTS By combining genomics, transcriptomics, immune responsiveness, and functional enzymology of AID from 36 extant species, we demonstrate that AID of that Atlantic cod and related fish have extremely lethargic or no catalytic activity. Through ancestral reconstruction and functional enzymology of 71 AID enzymes, we show that this enzymatic inactivation likely took place relatively recently at the emergence of the true cod family (Gadidae) from their ancestral Gadiformes order. We show that this AID inactivation is not only concordant with the previously shown loss of key adaptive immune genes and expansion of innate and cell-based immune genes in the Gadiformes but is further reflected in the genomes of these fish in the form of loss of AID-favored sequence motifs in their immunoglobulin variable region genes. CONCLUSIONS Recent demonstrations of the loss of the aicda gene in two fish species challenge the paradigm that AID-driven secondary antibody diversification is absolutely conserved in jawed vertebrates. These species have unusual reproductive behaviors forming an evolutionary pressure for a certain loss of immunity to avoid tissue rejection. We report here an instance of catalytic inactivation and functional loss of AID rather than gene loss in a conventionally reproducing vertebrate. Our data suggest that an expanded innate immunity, in addition to lower pathogenic pressures in a cold environment relieved the pressure to maintain robust secondary antibody diversification. We suggest that in this unique scenario, the AID-mediated collateral genome-wide damage would form an evolutionary pressure to lose AID function.
Collapse
Affiliation(s)
- Atefeh Ghorbani
- grid.61971.380000 0004 1936 7494Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada ,grid.25055.370000 0000 9130 6822Program in Immunology and Infectious Diseases, Division of Biomedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, Canada
| | - S. Javad Khataeipour
- grid.25055.370000 0000 9130 6822Department of Computer Science, Faculty of Science, Memorial University of Newfoundland, St. John’s, Canada
| | - Monica H. Solbakken
- grid.5510.10000 0004 1936 8921Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - David N. G. Huebert
- grid.61971.380000 0004 1936 7494Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada ,grid.25055.370000 0000 9130 6822Program in Immunology and Infectious Diseases, Division of Biomedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, Canada
| | - Minasadat Khoddami
- grid.61971.380000 0004 1936 7494Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada
| | - Khalil Eslamloo
- grid.25055.370000 0000 9130 6822Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, Canada
| | - Cassandra Collins
- grid.61971.380000 0004 1936 7494Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada
| | - Tiago Hori
- grid.25055.370000 0000 9130 6822Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, Canada
| | - Sissel Jentoft
- grid.5510.10000 0004 1936 8921Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Matthew L. Rise
- grid.25055.370000 0000 9130 6822Department of Ocean Sciences, Memorial University of Newfoundland, St. John’s, Canada
| | - Mani Larijani
- grid.61971.380000 0004 1936 7494Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada ,grid.25055.370000 0000 9130 6822Program in Immunology and Infectious Diseases, Division of Biomedical Sciences, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, Canada
| |
Collapse
|
15
|
Hager M, Pöhler MT, Reinhardt F, Wellner K, Hübner J, Betat H, Prohaska S, Mörl M. Substrate Affinity Versus Catalytic Efficiency: Ancestral Sequence Reconstruction of tRNA Nucleotidyltransferases Solves an Enzyme Puzzle. Mol Biol Evol 2022; 39:6835633. [PMID: 36409584 PMCID: PMC9728577 DOI: 10.1093/molbev/msac250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
In tRNA maturation, CCA-addition by tRNA nucleotidyltransferase is a unique and highly accurate reaction. While the mechanism of nucleotide selection and polymerization is well understood, it remains a mystery why bacterial and eukaryotic enzymes exhibit an unexpected and surprisingly low tRNA substrate affinity while they efficiently catalyze the CCA-addition. To get insights into the evolution of this high-fidelity RNA synthesis, the reconstruction and characterization of ancestral enzymes is a versatile tool. Here, we investigate a reconstructed candidate of a 2 billion years old CCA-adding enzyme from Gammaproteobacteria and compare it to the corresponding modern enzyme of Escherichia coli. We show that the ancestral candidate catalyzes an error-free CCA-addition, but has a much higher tRNA affinity compared with the extant enzyme. The consequence of this increased substrate binding is an enhanced reverse reaction, where the enzyme removes the CCA end from the mature tRNA. As a result, the ancestral candidate exhibits a lower catalytic efficiency in vitro as well as in vivo. Furthermore, the efficient tRNA interaction leads to a processive polymerization, while the extant enzyme catalyzes nucleotide addition in a distributive way. Thus, the modern enzymes increased their polymerization efficiency by lowering the binding affinity to tRNA, so that CCA synthesis is efficiently promoted due to a reduced reverse reaction. Hence, the puzzling and at a first glance contradicting and detrimental weak substrate interaction represents a distinct activity enhancement in the evolution of CCA-adding enzymes.
Collapse
Affiliation(s)
| | | | - Franziska Reinhardt
- Computational EvoDevo Group, Institute for Computer Science, Leipzig University, Härtelstr. 16-18, 04109 Leipzig, Germany,Interdisciplinary Centre for Bioinformatics, Leipzig University, Härtelstr. 16-18, 04109 Leipzig, Germany
| | - Karolin Wellner
- Institute for Biochemistry, Leipzig University, Brüderstraße 34, D-04103 Leipzig, Germany
| | - Jessica Hübner
- Computational EvoDevo Group, Institute for Computer Science, Leipzig University, Härtelstr. 16-18, 04109 Leipzig, Germany
| | - Heike Betat
- Institute for Biochemistry, Leipzig University, Brüderstraße 34, D-04103 Leipzig, Germany
| | - Sonja Prohaska
- Computational EvoDevo Group, Institute for Computer Science, Leipzig University, Härtelstr. 16-18, 04109 Leipzig, Germany,Interdisciplinary Centre for Bioinformatics, Leipzig University, Härtelstr. 16-18, 04109 Leipzig, Germany,Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA,Complexity Science Hub Vienna, Josefstädter Str. 39, 1080 Wien, Austria
| | | |
Collapse
|
16
|
Ayuso-Fernández I, Molpeceres G, Camarero S, Ruiz-Dueñas FJ, Martínez AT. Ancestral sequence reconstruction as a tool to study the evolution of wood decaying fungi. FRONTIERS IN FUNGAL BIOLOGY 2022; 3:1003489. [PMID: 37746217 PMCID: PMC10512382 DOI: 10.3389/ffunb.2022.1003489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/22/2022] [Indexed: 09/26/2023]
Abstract
The study of evolution is limited by the techniques available to do so. Aside from the use of the fossil record, molecular phylogenetics can provide a detailed characterization of evolutionary histories using genes, genomes and proteins. However, these tools provide scarce biochemical information of the organisms and systems of interest and are therefore very limited when they come to explain protein evolution. In the past decade, this limitation has been overcome by the development of ancestral sequence reconstruction (ASR) methods. ASR allows the subsequent resurrection in the laboratory of inferred proteins from now extinct organisms, becoming an outstanding tool to study enzyme evolution. Here we review the recent advances in ASR methods and their application to study fungal evolution, with special focus on wood-decay fungi as essential organisms in the global carbon cycling.
Collapse
Affiliation(s)
- Iván Ayuso-Fernández
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences (NMBU), Ås, Norway
| | - Gonzalo Molpeceres
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| | - Susana Camarero
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| | | | - Angel T. Martínez
- Centro de Investigaciones Biológicas “Margarita Salas” (CIB), CSIC, Madrid, Spain
| |
Collapse
|
17
|
Foley G, Mora A, Ross CM, Bottoms S, Sützl L, Lamprecht ML, Zaugg J, Essebier A, Balderson B, Newell R, Thomson RES, Kobe B, Barnard RT, Guddat L, Schenk G, Carsten J, Gumulya Y, Rost B, Haltrich D, Sieber V, Gillam EMJ, Bodén M. Engineering indel and substitution variants of diverse and ancient enzymes using Graphical Representation of Ancestral Sequence Predictions (GRASP). PLoS Comput Biol 2022; 18:e1010633. [PMID: 36279274 PMCID: PMC9632902 DOI: 10.1371/journal.pcbi.1010633] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 11/03/2022] [Accepted: 10/04/2022] [Indexed: 11/06/2022] Open
Abstract
Ancestral sequence reconstruction is a technique that is gaining widespread use in molecular evolution studies and protein engineering. Accurate reconstruction requires the ability to handle appropriately large numbers of sequences, as well as insertion and deletion (indel) events, but available approaches exhibit limitations. To address these limitations, we developed Graphical Representation of Ancestral Sequence Predictions (GRASP), which efficiently implements maximum likelihood methods to enable the inference of ancestors of families with more than 10,000 members. GRASP implements partial order graphs (POGs) to represent and infer insertion and deletion events across ancestors, enabling the identification of building blocks for protein engineering. To validate the capacity to engineer novel proteins from realistic data, we predicted ancestor sequences across three distinct enzyme families: glucose-methanol-choline (GMC) oxidoreductases, cytochromes P450, and dihydroxy/sugar acid dehydratases (DHAD). All tested ancestors demonstrated enzymatic activity. Our study demonstrates the ability of GRASP (1) to support large data sets over 10,000 sequences and (2) to employ insertions and deletions to identify building blocks for engineering biologically active ancestors, by exploring variation over evolutionary time.
Collapse
Affiliation(s)
- Gabriel Foley
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Ariane Mora
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Connie M. Ross
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Scott Bottoms
- Campus Straubing for Biotechnology and Sustainability, Technische Universität München, Straubing, Germany
| | - Leander Sützl
- Institut für Lebensmitteltechnologie, Universität für Bodenkultur Wien, Vienna, Austria
| | - Marnie L. Lamprecht
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Julian Zaugg
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Alexandra Essebier
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Brad Balderson
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Rhys Newell
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Raine E. S. Thomson
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Bostjan Kobe
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, Brisbane, Australia
| | - Ross T. Barnard
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Luke Guddat
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Gerhard Schenk
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Sustainable Minerals Institute, The University of Queensland, Brisbane, Australia
| | - Jörg Carsten
- Zentralinstitut für Katalyseforschung, Technische Universität München, Munich, Germany
| | - Yosephine Gumulya
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Burkhard Rost
- Fakultät für Informatik, Technische Universität München, Munich, Germany
| | - Dietmar Haltrich
- Institut für Lebensmitteltechnologie, Universität für Bodenkultur Wien, Vienna, Austria
| | - Volker Sieber
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Campus Straubing for Biotechnology and Sustainability, Technische Universität München, Straubing, Germany
- Zentralinstitut für Katalyseforschung, Technische Universität München, Munich, Germany
| | - Elizabeth M. J. Gillam
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- * E-mail: (MB); (EMJG)
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- * E-mail: (MB); (EMJG)
| |
Collapse
|
18
|
Fer E, McGrath KM, Guy L, Hockenberry AJ, Kaçar B. Early divergence of translation initiation and elongation factors. Protein Sci 2022; 31:e4393. [PMID: 36250475 PMCID: PMC9601768 DOI: 10.1002/pro.4393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/05/2022] [Accepted: 07/11/2022] [Indexed: 11/18/2022]
Abstract
Protein translation is a foundational attribute of all living cells. The translation function carried out by the ribosome critically depends on an assortment of protein interaction partners, collectively referred to as the translation machinery. Various studies suggest that the diversification of the translation machinery occurred prior to the last universal common ancestor, yet it is unclear whether the predecessors of the extant translation machinery factors were functionally distinct from their modern counterparts. Here we reconstructed the shared ancestral trajectory and subsequent evolution of essential translation factor GTPases, elongation factor EF-Tu (aEF-1A/eEF-1A), and initiation factor IF2 (aIF5B/eIF5B). Based upon their similar functions and structural homologies, it has been proposed that EF-Tu and IF2 emerged from an ancient common ancestor. We generated the phylogenetic tree of IF2 and EF-Tu proteins and reconstructed ancestral sequences corresponding to the deepest nodes in their shared evolutionary history, including the last common IF2 and EF-Tu ancestor. By identifying the residue and domain substitutions, as well as structural changes along the phylogenetic history, we developed an evolutionary scenario for the origins, divergence and functional refinement of EF-Tu and IF2 proteins. Our analyses suggest that the common ancestor of IF2 and EF-Tu was an IF2-like GTPase protein. Given the central importance of the translation machinery to all cellular life, its earliest evolutionary constraints and trajectories are key to characterizing the universal constraints and capabilities of cellular evolution.
Collapse
Affiliation(s)
- Evrim Fer
- Department of BacteriologyUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- Microbiology Doctoral Training ProgramUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- NASA Center for Early Life and EvolutionUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Kaitlyn M. McGrath
- Department of BacteriologyUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- NASA Center for Early Life and EvolutionUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- Department of Molecular and Cellular BiologyUniversity of ArizonaTucsonArizonaUSA
| | - Lionel Guy
- Department of Medical Biochemistry and Microbiology, Science for Life LaboratoryUppsala UniversityUppsalaSweden
| | - Adam J. Hockenberry
- Department of Integrative BiologyThe University of Texas at AustinAustinTexasUSA
| | - Betül Kaçar
- Department of BacteriologyUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
- NASA Center for Early Life and EvolutionUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| |
Collapse
|
19
|
Kille B, Balaji A, Sedlazeck FJ, Nute M, Treangen TJ. Multiple genome alignment in the telomere-to-telomere assembly era. Genome Biol 2022; 23:182. [PMID: 36038949 PMCID: PMC9421119 DOI: 10.1186/s13059-022-02735-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 07/21/2022] [Indexed: 01/22/2023] Open
Abstract
With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.
Collapse
Affiliation(s)
- Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Michael Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
20
|
Engineering functional thermostable proteins using ancestral sequence reconstruction. J Biol Chem 2022; 298:102435. [PMID: 36041629 PMCID: PMC9525910 DOI: 10.1016/j.jbc.2022.102435] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 08/23/2022] [Accepted: 08/24/2022] [Indexed: 11/20/2022] Open
Abstract
Natural proteins are often only slightly more stable in the native state than the denatured state, and an increase in environmental temperature can easily shift the balance towards unfolding. Therefore, the engineering of proteins to improve protein stability is an area of intensive research. Thermostable proteins are required to withstand industrial process conditions, for increased shelf-life of protein therapeutics, for developing robust 'biobricks' for synthetic biology applications, and for research purposes (e.g. structure determination). In addition, thermostability buffers the often destabilizing effects of mutations introduced to improve other properties. Rational design approaches to engineering thermostability require structural information, but even with advanced computational methods, it is challenging to predict or parameterize all the relevant structural factors with sufficient precision to anticipate the results of a given mutation. Directed evolution is an alternative when structures are unavailable but requires extensive screening of mutant libraries. Recently however, bioinspired approaches based on phylogenetic analyses have shown great promise. Leveraging the rapid expansion in sequence data and bioinformatic tools, ancestral sequence reconstruction (ASR) can generate highly stable folds for novel applications in industrial chemistry, medicine, and synthetic biology. This review provides an overview of the factors important for successful inference of thermostable proteins by ASR and what it can reveal about the determinants of stability in proteins.
Collapse
|
21
|
De Maio N, Boulton W, Weilguny L, Walker CR, Turakhia Y, Corbett-Detig R, Goldman N. phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets. PLoS Comput Biol 2022; 18:e1010056. [PMID: 35486906 PMCID: PMC9094560 DOI: 10.1371/journal.pcbi.1010056] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 05/11/2022] [Accepted: 03/25/2022] [Indexed: 11/26/2022] Open
Abstract
Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - William Boulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Lukas Weilguny
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| | - Conor R. Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, California, United States of America
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California, United States of America
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, United Kingdom
| |
Collapse
|
22
|
Garcia AK, Kolaczkowski B, Kaçar B. Reconstruction of nitrogenase predecessors suggests origin from maturase-like proteins. Genome Biol Evol 2022; 14:6531971. [PMID: 35179578 PMCID: PMC8890362 DOI: 10.1093/gbe/evac031] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2022] [Indexed: 11/17/2022] Open
Abstract
The evolution of biological nitrogen fixation, uniquely catalyzed by nitrogenase enzymes, has been one of the most consequential biogeochemical innovations over life’s history. Though understanding the early evolution of nitrogen fixation has been a longstanding goal from molecular, biogeochemical, and planetary perspectives, its origins remain enigmatic. In this study, we reconstructed the evolutionary histories of nitrogenases, as well as homologous maturase proteins that participate in the assembly of the nitrogenase active-site cofactor but are not able to fix nitrogen. We combined phylogenetic and ancestral sequence inference with an analysis of predicted functionally divergent sites between nitrogenases and maturases to infer the nitrogen-fixing capabilities of their shared ancestors. Our results provide phylogenetic constraints to the emergence of nitrogen fixation and are consistent with a model wherein nitrogenases emerged from maturase-like predecessors. Though the precise functional role of such a predecessor protein remains speculative, our results highlight evolutionary contingency as a significant factor shaping the evolution of a biogeochemically essential enzyme.
Collapse
Affiliation(s)
- Amanda K Garcia
- Department of Bacteriology, University of Wisconsin - Madison, USA
| | - Bryan Kolaczkowski
- Department of Microbiology and Cell Science, University of Florida, Gainesville, Florida, USA
| | - Betül Kaçar
- Department of Bacteriology, University of Wisconsin - Madison, USA
| |
Collapse
|
23
|
Mascotti ML. Resurrecting Enzymes by Ancestral Sequence Reconstruction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2397:111-136. [PMID: 34813062 DOI: 10.1007/978-1-0716-1826-4_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Ancestral Sequence Reconstruction (ASR) allows one to infer the sequences of extinct proteins using the phylogeny of extant proteins. It consists of disclosing the evolutionary history-i.e., the phylogeny-of a protein family of interest and then inferring the sequences of its ancestors-i.e., the nodes in the phylogeny. Assisted by gene synthesis, the selected ancestors can be resurrected in the lab and experimentally characterized. The crucial step to succeed with ASR is starting from a reliable phylogeny. At the same time, it is of the utmost importance to have a clear idea on the evolutionary history of the family under study and the events that influenced it. This allows us to implement ASR with well-defined hypotheses and to apply the appropriate experimental methods. In the last years, ASR has become popular to test hypotheses about the origin of functionalities, changes in activities, understanding physicochemical properties of proteins, among others. In this context, the aim of this chapter is to present the ASR approach applied to the reconstruction of enzymes-i.e., proteins with catalytic roles. The spirit of this contribution is to provide a basic, hands-to-work guide for biochemists and biologists who are unfamiliar with molecular phylogenetics.
Collapse
Affiliation(s)
- Maria Laura Mascotti
- Molecular Enzymology group, University of Groningen, Groningen, The Netherlands. .,IMIBIO-SL CONICET, Facultad de Química Bioquímica y Farmacia, Universidad Nacional de San Luis, San Luis, Argentina.
| |
Collapse
|
24
|
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion Mutants from Robust, Ancestral Templates Using Graphical Representation of Ancestral Sequence Predictions (GRASP). METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2397:85-110. [PMID: 34813061 DOI: 10.1007/978-1-0716-1826-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Analyzing the natural evolution of proteins by ancestral sequence reconstruction (ASR) can provide valuable information about the changes in sequence and structure that drive the development of novel protein functions. However, ASR has also been used as a protein engineering tool, as it often generates thermostable proteins which can serve as robust and evolvable templates for enzyme engineering. Importantly, ASR has the potential to provide an insight into the history of insertions and deletions that have occurred in the evolution of a protein family. Indels are strongly associated with functional change during enzyme evolution and represent a largely unexplored source of genetic diversity for designing proteins with novel or improved properties. Current ASR methods differ in the way they handle indels; inclusion or exclusion of indels is often managed subjectively, based on assumptions the user makes about the likelihood of each recombination event, yet most currently available ASR tools provide limited, if any, opportunities for evaluating indel placement in a reconstructed sequence. Graphical Representation of Ancestral Sequence Predictions (GRASP) is an ASR tool that maps indel evolution throughout a reconstruction and enables the evaluation of indel variants. This chapter provides a general protocol for performing a reconstruction using GRASP and using the results to create indel variants. The method addresses protein template selection, sequence curation, alignment refinement, tree building, ancestor reconstruction, evaluation of indel variants and approaches to library development.
Collapse
|
25
|
Lichman BR. Ancestral Sequence Reconstruction for Exploring Alkaloid Evolution. Methods Mol Biol 2022; 2505:165-179. [PMID: 35732944 DOI: 10.1007/978-1-0716-2349-7_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The complex and bioactive monoterpene indole alkaloids (MIAs) found in Catharanthus roseus and related species are the products of many millions of years of evolution through mutation and natural selection. Ancestral sequence reconstruction (ASR) is a method that combines phylogenetic analysis and experimental biochemistry to infer details about past events in protein evolution. Here, I propose that ASR could be leveraged to understand how enzymes catalyzing the formation of complex alkaloids arose over evolutionary time. I discuss the steps of ASR, including sequence selection, multiple sequence alignment, tree inference, and the generation and characterization of inferred ancestral enzymes.
Collapse
Affiliation(s)
- Benjamin R Lichman
- Centre for Novel Agricultural Products, Department of Biology, University of York, York, UK.
| |
Collapse
|
26
|
Garcia AK, Fer E, Sephus C, Kacar B. An Integrated Method to Reconstruct Ancient Proteins. Methods Mol Biol 2022; 2569:267-281. [PMID: 36083453 DOI: 10.1007/978-1-0716-2691-7_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Proteins have played a fundamental role throughout life's history on Earth. Despite their biological importance, ancient origin, early function, and evolution of proteins are seldom able to be directly studied because few of these attributes are preserved across geologic timescales. Ancestral sequence reconstruction (ASR) provides a method to infer ancestral amino acid sequences and determine the evolutionary predecessors of modern-day proteins using phylogenetic tools. Laboratory application of ASR allows ancient sequences to be deduced from genetic information available in extant organisms and then experimentally resurrected to elucidate ancestral characteristics. In this article, we provide a generalized, stepwise protocol that considers the major elements of a well-designed ASR study and details potential sources of reconstruction bias that can reduce the relevance of historical inferences. We underscore key stages in our approach so that it may be broadly utilized to reconstruct the evolutionary histories of proteins.
Collapse
Affiliation(s)
- Amanda K Garcia
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
| | - Evrim Fer
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
- Microbiology Doctoral Training Program, University of Wisconsin-Madison, Madison, WI, USA
| | - Cathryn Sephus
- Scripps Institution of Oceanography, University of California at San Diego, La Jolla, CA, USA
| | - Betul Kacar
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
27
|
Farihan Afnan Mohd Rozi M, Noor Zaliha Raja Abd Rahman R, Thean Chor Leow A, Shukuri Mohamad Ali M. Ancestral Sequence Reconstruction of Ancient Lipase from Family I.3 Bacterial Lipolytic Enzymes. Mol Phylogenet Evol 2021; 168:107381. [PMID: 34968679 DOI: 10.1016/j.ympev.2021.107381] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 10/27/2021] [Accepted: 11/29/2021] [Indexed: 01/14/2023]
Abstract
Family I.3 lipase is distinguished from other families by the amino acid sequence and secretion mechanism. Little is known about the evolutionary process driving these differences. This study attempt to understand how the diverse temperature stabilities of bacterial lipases from family I.3 evolved. To achieve that, eighty-three protein sequences sharing a minimum 30% sequence identity with Antarctic Pseudomonas sp. AMS8 lipase were used to infer phylogenetic tree. Using ancestral sequence reconstruction (ASR) technique, the last universal common ancestor (LUCA) sequence of family I.3 was reconstructed. A gene encoding LUCA was synthesized, cloned and expressed as inclusion bodies in E. coli system. Insoluble form of LUCA was refolded using urea dilution method and then purified using affinity chromatography. The purified LUCA exhibited an optimum temperature and pH at 70℃ and 10 respectively. Various metal ions increased or retained the activity of LUCA. LUCA also demonstrated tolerance towards various organic solvents in 25% v/v concentration. The finding from this study could support the understanding on temperature and environment during ancient time. In overall, reconstructed ancestral enzymes have improved physicochemical properties that make them suitable for industrial applications and ASR technique can be employed as a general technique for enzyme engineering.
Collapse
Affiliation(s)
- Mohamad Farihan Afnan Mohd Rozi
- Enzyme and Microbial Technology Research Centre (EMTech), Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia
| | - Raja Noor Zaliha Raja Abd Rahman
- Enzyme and Microbial Technology Research Centre (EMTech), Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; Department of Microbiology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia
| | - Adam Thean Chor Leow
- Enzyme and Microbial Technology Research Centre (EMTech), Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia
| | - Mohd Shukuri Mohamad Ali
- Enzyme and Microbial Technology Research Centre (EMTech), Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; Department of Biochemistry, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia; Institute of Bioscience, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia.
| |
Collapse
|
28
|
De Maio N, Boulton W, Weilguny L, Walker CR, Turakhia Y, Corbett-Detig R, Goldman N. phastSim: efficient simulation of sequence evolution for pandemic-scale datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.03.15.435416. [PMID: 33758852 PMCID: PMC7987011 DOI: 10.1101/2021.03.15.435416] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, as well as being part of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100,000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software is available from https://github.com/NicolaDM/phastSim and allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutatability models that we developed to more realistically represent SARS-CoV-2 genome evolution.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - William Boulton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lukas Weilguny
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Conor R. Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA 92093, USA
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
29
|
Dantas PHLF, José MV, de Farias ST. Structural Computational Analysis of the Natural History of Class I aminoacyl-tRNA Synthetases Suggests their Role in Establishing the Genetic Code. J Mol Evol 2021; 89:611-617. [PMID: 34505179 DOI: 10.1007/s00239-021-10029-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 09/02/2021] [Indexed: 12/11/2022]
Abstract
The evolutionary history of Class I aminoacyl-tRNA synthetases (aaRS) through the reconstruction of ancestral sequences is presented. From structural molecular modeling, we sought to understand its relationship with the acceptor arms and the tRNA anticodon loop, how this relationship was established, and the possible implications in determining the genetic code and the translation system. The results of the molecular docking showed that in 7 out 9 aaRS, the acceptor arm and the anticodon loop bond practically in the same region. Domain accretion process in aaRS and repositioning of interactions between tRNAs and aaRS are illustrated. Based on these results, we propose that the operational code and the anticodon code coexisted, competing for the aaRS catalytic region, while consequently contributed to the stabilization of these proteins.
Collapse
Affiliation(s)
- Pedro Henrique Lopes Ferreira Dantas
- Laboratório de Genética Evolutiva Paulo Leminski, Centro de Ciências Exatas e da Natureza, Universidade Federal da Paraíba, João Pessoa, Paraíba, Brazil
| | - Marco V José
- Network of Researchers on the Chemical Evolution of Life (NoRCEL), Leeds, LS7 3RB, UK.,Theoretical Biology Group, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, C.P. 04510, Mexico, Mexico
| | - Sávio Torres de Farias
- Laboratório de Genética Evolutiva Paulo Leminski, Centro de Ciências Exatas e da Natureza, Universidade Federal da Paraíba, João Pessoa, Paraíba, Brazil. .,Network of Researchers on the Chemical Evolution of Life (NoRCEL), Leeds, LS7 3RB, UK.
| |
Collapse
|
30
|
Loewenthal G, Rapoport D, Avram O, Moshe A, Wygoda E, Itzkovitch A, Israeli O, Azouri D, Cartwright RA, Mayrose I, Pupko T. A probabilistic model for indel evolution: differentiating insertions from deletions. Mol Biol Evol 2021; 38:5769-5781. [PMID: 34469521 PMCID: PMC8662616 DOI: 10.1093/molbev/msab266] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.
Collapse
Affiliation(s)
- Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Rapoport
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Alon Itzkovitch
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Omer Israeli
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Azouri
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.,School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, Arizona, USA.,School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
31
|
Understanding the Genetic Diversity of Picobirnavirus: A Classification Update Based on Phylogenetic and Pairwise Sequence Comparison Approaches. Viruses 2021; 13:v13081476. [PMID: 34452341 PMCID: PMC8402817 DOI: 10.3390/v13081476] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Revised: 07/20/2021] [Accepted: 07/23/2021] [Indexed: 11/29/2022] Open
Abstract
Picobirnaviruses (PBVs) are small, double stranded RNA viruses with an ability to infect a myriad of hosts and possessing a high degree of genetic diversity. PBVs are currently classified into two genogroups based upon classification of a 200 nt sequence of RdRp. We demonstrate here that this phylogenetic marker is saturated, affected by homoplasy, and has high phylogenetic noise, resulting in 34% unsolved topologies. By contrast, full-length RdRp sequences provide reliable topologies that allow ancestralism of members to be correctly inferred. MAFFT alignment and maximum likelihood trees were established as the optimal methods to determine phylogenetic relationships, providing complete resolution of PBV RdRp and capsid taxa, each into three monophyletic groupings. Pairwise distance calculations revealed these lineages represent three species. For RdRp, the application of cutoffs determined by theoretical taxonomic distributions indicates that there are five genotypes in species 1, eight genotypes in species 2, and three genotypes in species 3. Capsids were also divided into three species, but sequences did not segregate into statistically supported subdivisions, indicating that diversity is lower than RdRp. We thus propose the adoption of a new nomenclature to indicate the species of each segment (e.g., PBV-C1R2).
Collapse
|
32
|
Aadland K, Kolaczkowski B. Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy. Genome Biol Evol 2021; 12:1549-1565. [PMID: 32785673 PMCID: PMC7523730 DOI: 10.1093/gbe/evaa164] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2020] [Indexed: 12/31/2022] Open
Abstract
Ancestral sequence reconstruction (ASR) uses an alignment of extant protein sequences, a phylogeny describing the history of the protein family and a model of the molecular-evolutionary process to infer the sequences of ancient proteins, allowing researchers to directly investigate the impact of sequence evolution on protein structure and function. Like all statistical inferences, ASR can be sensitive to violations of its underlying assumptions. Previous studies have shown that, whereas phylogenetic uncertainty has only a very weak impact on ASR accuracy, uncertainty in the protein sequence alignment can more strongly affect inferred ancestral sequences. Here, we show that errors in sequence alignment can produce errors in ASR across a range of realistic and simplified evolutionary scenarios. Importantly, sequence reconstruction errors can lead to errors in estimates of structural and functional properties of ancestral proteins, potentially undermining the reliability of analyses relying on ASR. We introduce an alignment-integrated ASR approach that combines information from many different sequence alignments. We show that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly accurate structure-guided alignment. Given the growing evidence that sequence alignment errors can impact the reliability of ASR studies, we recommend that future studies incorporate approaches to mitigate the impact of alignment uncertainty. Probabilistic modeling of insertion and deletion events has the potential to radically improve ASR accuracy when the model reflects the true underlying evolutionary history, but further studies are required to thoroughly evaluate the reliability of these approaches under realistic conditions.
Collapse
Affiliation(s)
- Kelsey Aadland
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida
| | - Bryan Kolaczkowski
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida
| |
Collapse
|
33
|
Spence MA, Kaczmarski JA, Saunders JW, Jackson CJ. Ancestral sequence reconstruction for protein engineers. Curr Opin Struct Biol 2021; 69:131-141. [PMID: 34023793 DOI: 10.1016/j.sbi.2021.04.001] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/22/2021] [Accepted: 04/07/2021] [Indexed: 12/11/2022]
Abstract
In addition to its value in the study of molecular evolution, ancestral sequence reconstruction (ASR) has emerged as a useful methodology for engineering proteins with enhanced properties. Proteins generated by ASR often exhibit unique or improved activity, stability, and/or promiscuity, all of which are properties that are valued by protein engineers. Comparison between extant proteins and evolutionary intermediates generated by ASR also allows protein engineers to identify substitutions that have contributed to functional innovation or diversification within protein families. As ASR becomes more widely adopted as a protein engineering approach, it is important to understand the applications, limitations, and recent developments of this technique. This review highlights recent exemplifications of ASR, as well as technical aspects of the reconstruction process that are relevant to protein engineering.
Collapse
Affiliation(s)
- Matthew A Spence
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Joe A Kaczmarski
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Jake W Saunders
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia
| | - Colin J Jackson
- Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia; ARC Centre of Excellence for Innovations in Peptide & Protein Science, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia; ARC Centre of Excellence for Innovations in Synthetic Biology, Research School of Chemistry, Australian National University, Canberra, ACT 2601, Australia.
| |
Collapse
|
34
|
Abstract
Multiple sequence alignment is a core first step in many bioinformatics analyses, and errors in these alignments can have negative consequences for scientific studies. In this article, we review some of the recent literature evaluating multiple sequence alignment methods and identify specific challenges that arise when performing these evaluations. In particular, we discuss the different trends observed in simulation studies and when using biological benchmarks. Overall, we find that multiple sequence alignment, far from being a "solved problem," would benefit from new attention.
Collapse
Affiliation(s)
- Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
35
|
Löytynoja A. Phylogeny-Aware Alignment with PRANK and PAGAN. Methods Mol Biol 2021; 2231:17-37. [PMID: 33289884 DOI: 10.1007/978-1-0716-1036-7_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Evolutionary analyses require sequence alignments that correctly represent evolutionary homology. Evolutionary homology and proteins' structural similarity are not the same and sequence alignments generated with methods designed for structural matching can be seriously misleading in comparative and phylogenetic analyses. The phylogeny-aware alignment algorithm implemented in the program PRANK has been shown to produce good alignments for evolutionary inferences. Unlike other alignment programs, PRANK makes use of phylogenetic information to distinguish alignment gaps caused by insertions or deletions and, thereafter, handles the two types of events differently. As a by-product of the correct handling of insertions and deletions, PRANK can provide the inferred ancestral sequences as a part of the output and mark the alignment gaps differently depending on their origin in insertion or deletion events. As the algorithm infers the evolutionary history of the sequences, PRANK can be sensitive to errors in the guide phylogeny and violations on the underlying assumptions about the origin and patterns of gaps. To mitigate the effects of such model violations, the phylogeny-aware alignment algorithm has been re-implemented in program PAGAN. By using sequence graphs, PAGAN can model and accumulate evidence from more complex gap structures than PRANK does, and incorporate this uncertainty in the inferred ancestral sequences. These issues are discussed in detail below and practical advice is provided for the use of PRANK and PAGAN in evolutionary analysis. The two software packages can be downloaded from http://wasabiapp.org/software .
Collapse
Affiliation(s)
- Ari Löytynoja
- Institute of Biotechnology, HiLIFE, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
36
|
Scossa F, Fernie AR. Ancestral sequence reconstruction - An underused approach to understand the evolution of gene function in plants? Comput Struct Biotechnol J 2021; 19:1579-1594. [PMID: 33868595 PMCID: PMC8039532 DOI: 10.1016/j.csbj.2021.03.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 03/04/2021] [Accepted: 03/06/2021] [Indexed: 02/06/2023] Open
Abstract
Whilst substantial research effort has been placed on understanding the interactions of plant proteins with their molecular partners, relatively few studies in plants - by contrast to work in other organisms - address how these interactions evolve. It is thought that ancestral proteins were more promiscuous than modern proteins and that specificity often evolved following gene duplication and subsequent functional refining. However, ancestral protein resurrection studies have found that some modern proteins have evolved de novo from ancestors lacking those functions. Intriguingly, the new interactions evolved as a consequence of just a few mutations and, as such, acquisition of new functions appears to be neither difficult nor rare, however, only a few of them are incorporated into biological processes before they are lost to subsequent mutations. Here, we detail the approach of ancestral sequence reconstruction (ASR), providing a primer to reconstruct the sequence of an ancestral gene. We will present case studies from a range of different eukaryotes before discussing the few instances where ancestral reconstructions have been used in plants. As ASR is used to dig into the remote evolutionary past, we will also present some alternative genetic approaches to investigate molecular evolution on shorter timescales. We argue that the study of plant secondary metabolism is particularly well suited for ancestral reconstruction studies. Indeed, its ancient evolutionary roots and highly diverse landscape provide an ideal context in which to address the focal issue around the emergence of evolutionary novelties and how this affects the chemical diversification of plant metabolism.
Collapse
Key Words
- APR, ancestral protein resurrection
- ASR, ancestral sequence reconstruction
- Ancestral sequence reconstruction
- CDS, coding sequence
- Evolution
- GR, glucocorticoid receptor
- GWAS, genome wide association study
- Genomics
- InDel, insertion/deletion
- MCMC, Markov Chain Monte Carlo
- ML, maximum likelihood
- MP, maximum parsimony
- MR, mineralcorticoid receptor
- MSA, multiple sequence alignment
- Metabolism
- NJ, neighbor-joining
- Phylogenetics
- Plants
- SFS, site frequency spectrum
Collapse
Affiliation(s)
- Federico Scossa
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics (CREA-GB), Rome, Italy
| | - Alisdair R. Fernie
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| |
Collapse
|
37
|
Selberg AGA, Gaucher EA, Liberles DA. Ancestral Sequence Reconstruction: From Chemical Paleogenetics to Maximum Likelihood Algorithms and Beyond. J Mol Evol 2021; 89:157-164. [PMID: 33486547 PMCID: PMC7828096 DOI: 10.1007/s00239-021-09993-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Accepted: 01/04/2021] [Indexed: 12/13/2022]
Abstract
As both a computational and an experimental endeavor, ancestral sequence reconstruction remains a timely and important technique. Modern approaches to conduct ancestral sequence reconstruction for proteins are built upon a conceptual framework from journal founder Emile Zuckerkandl. On top of this, work on maximum likelihood phylogenetics published in Journal of Molecular Evolution in 1996 was one of the first approaches for generating maximum likelihood ancestral sequences of proteins. From its computational history, future model development needs as well as potential applications in areas as diverse as computational systems biology, molecular community ecology, infectious disease therapeutics and other biomedical applications, and biotechnology are discussed. From its past in this journal, there is a bright future for ancestral sequence reconstruction in the field of evolutionary biology.
Collapse
Affiliation(s)
- Avery G A Selberg
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - Eric A Gaucher
- Department of Biology, Georgia State University, Atlanta, GA, 30303, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.
| |
Collapse
|
38
|
Laursen L, Čalyševa J, Gibson TJ, Jemth P. Divergent Evolution of a Protein-Protein Interaction Revealed through Ancestral Sequence Reconstruction and Resurrection. Mol Biol Evol 2021; 38:152-167. [PMID: 32750125 PMCID: PMC7782867 DOI: 10.1093/molbev/msaa198] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The postsynaptic density extends across the postsynaptic dendritic spine with discs large (DLG) as the most abundant scaffolding protein. DLG dynamically alters the structure of the postsynaptic density, thus controlling the function and distribution of specific receptors at the synapse. DLG contains three PDZ domains and one important interaction governing postsynaptic architecture is that between the PDZ3 domain from DLG and a protein called cysteine-rich interactor of PDZ3 (CRIPT). However, little is known regarding functional evolution of the PDZ3:CRIPT interaction. Here, we subjected PDZ3 and CRIPT to ancestral sequence reconstruction, resurrection, and biophysical experiments. We show that the PDZ3:CRIPT interaction is an ancient interaction, which was likely present in the last common ancestor of Eukaryotes, and that high affinity is maintained in most extant animal phyla. However, affinity is low in nematodes and insects, raising questions about the physiological function of the interaction in species from these animal groups. Our findings demonstrate how an apparently established protein-protein interaction involved in cellular scaffolding in bilaterians can suddenly be subject to dynamic evolution including possible loss of function.
Collapse
Affiliation(s)
- Louise Laursen
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jelena Čalyševa
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Faculty of Biosciences, Collaboration for Joint PhD Degree between EMBL and Heidelberg University
| | - Toby J Gibson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Per Jemth
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
39
|
Kropp C, Straub K, Linde M, Babinger P. Hexamerization and thermostability emerged very early during geranylgeranylglyceryl phosphate synthase evolution. Protein Sci 2020; 30:583-596. [PMID: 33342010 PMCID: PMC7888582 DOI: 10.1002/pro.4016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 12/09/2020] [Accepted: 12/11/2020] [Indexed: 12/12/2022]
Abstract
A large number of archaea live in hyperthermophilic environments. In consequence, their proteins need to adopt to these harsh conditions, including the enzymes that catalyze the synthesis of their membrane ether lipids. The enzyme that catalyzes the formation of the first ether bond in these lipids, geranylgeranylglyceryl phosphate synthase (GGGPS), exists as a hexamer in many hyperthermophilic archaea, and a recent study suggested that hexamerization serves for a fine‐tuning of the flexibility – stability trade‐off under hyperthermophilic conditions. We have recently reconstructed the sequences of ancestral group II GGGPS enzymes and now present a detailed biochemical characterization of nine of these predecessors, which allowed us to trace back the evolution of hexameric GGGPS and to draw conclusions about the properties of extant GGGPS branches that were not accessible to experiments up to now. Almost all ancestral GGGPS proteins formed hexamers, which demonstrates that hexamerization is even more widespread among the GGGPS family than previously assumed. Furthermore, all experimentally studied ancestral proteins showed high thermostability. Our results indicate that the hexameric oligomerization state and thermostability were present very early during the evolution of group II GGGPS, while the fine tuning of the flexibility – stability trade‐off developed very late, independent of the emergence of hexamerization.
Collapse
Affiliation(s)
- Cosimo Kropp
- Institute of Biophysics and Physical Biochemistry, Regensburg Center for Biochemistry, University of Regensburg, Regensburg, Germany
| | - Kristina Straub
- Institute of Biophysics and Physical Biochemistry, Regensburg Center for Biochemistry, University of Regensburg, Regensburg, Germany
| | - Mona Linde
- Institute of Biophysics and Physical Biochemistry, Regensburg Center for Biochemistry, University of Regensburg, Regensburg, Germany
| | - Patrick Babinger
- Institute of Biophysics and Physical Biochemistry, Regensburg Center for Biochemistry, University of Regensburg, Regensburg, Germany
| |
Collapse
|
40
|
Smith SA, Walker-Hale N, Walker JF. Intragenic Conflict in Phylogenomic Data Sets. Mol Biol Evol 2020; 37:3380-3388. [DOI: 10.1093/molbev/msaa170] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | | | - Joseph F Walker
- The Sainsbury Laboratory (SLCU), University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
41
|
Non-conservation of folding rates in the thioredoxin family reveals degradation of ancestral unassisted-folding. Biochem J 2020; 476:3631-3647. [PMID: 31750876 PMCID: PMC6906118 DOI: 10.1042/bcj20190739] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Revised: 11/19/2019] [Accepted: 11/21/2019] [Indexed: 01/04/2023]
Abstract
Evolution involves not only adaptation, but also the degradation of superfluous features. Many examples of degradation at the morphological level are known (vestigial organs, for instance). However, the impact of degradation on molecular evolution has been rarely addressed. Thioredoxins serve as general oxidoreductases in all cells. Here, we report extensive mutational analyses on the folding of modern and resurrected ancestral bacterial thioredoxins. Contrary to claims from recent literature, in vitro folding rates in the thioredoxin family are not evolutionarily conserved, but span at least a ∼100-fold range. Furthermore, modern thioredoxin folding is often substantially slower than ancestral thioredoxin folding. Unassisted folding, as probed in vitro, thus emerges as an ancestral vestigial feature that underwent degradation, plausibly upon the evolutionary emergence of efficient cellular folding assistance. More generally, our results provide evidence that degradation of ancestral features shapes, not only morphological evolution, but also the evolution of individual proteins.
Collapse
|
42
|
Jermiin LS, Catullo RA, Holland BR. A new phylogenetic protocol: dealing with model misspecification and confirmation bias in molecular phylogenetics. NAR Genom Bioinform 2020; 2:lqaa041. [PMID: 33575594 PMCID: PMC7671319 DOI: 10.1093/nargab/lqaa041] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 05/18/2020] [Accepted: 06/04/2020] [Indexed: 12/15/2022] Open
Abstract
Molecular phylogenetics plays a key role in comparative genomics and has increasingly significant impacts on science, industry, government, public health and society. In this paper, we posit that the current phylogenetic protocol is missing two critical steps, and that their absence allows model misspecification and confirmation bias to unduly influence phylogenetic estimates. Based on the potential offered by well-established but under-used procedures, such as assessment of phylogenetic assumptions and tests of goodness of fit, we introduce a new phylogenetic protocol that will reduce confirmation bias and increase the accuracy of phylogenetic estimates.
Collapse
Affiliation(s)
- Lars S Jermiin
- CSIRO Land & Water, Canberra, ACT 2601, Australia
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
- School of Biology & Environment Science, University College Dublin, Belfield, Dublin 4, Ireland
- Earth Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Renee A Catullo
- CSIRO Land & Water, Canberra, ACT 2601, Australia
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
- School of Science and Health & Hawkesbury Institute of the Environment, Western Sydney University, Penrith, NSW 2751, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Hobart, TAS 7001, Australia
| |
Collapse
|
43
|
Abstract
Knowing phylogenetic relationships among species is fundamental for many studies in biology. An accurate phylogenetic tree underpins our understanding of the major transitions in evolution, such as the emergence of new body plans or metabolism, and is key to inferring the origin of new genes, detecting molecular adaptation, understanding morphological character evolution and reconstructing demographic changes in recently diverged species. Although data are ever more plentiful and powerful analysis methods are available, there remain many challenges to reliable tree building. Here, we discuss the major steps of phylogenetic analysis, including identification of orthologous genes or proteins, multiple sequence alignment, and choice of substitution models and inference methodologies. Understanding the different sources of errors and the strategies to mitigate them is essential for assembling an accurate tree of life.
Collapse
|
44
|
Huang X, Albou LP, Mushayahama T, Muruganujan A, Tang H, Thomas PD. Ancestral Genomes: a resource for reconstructed ancestral genes and genomes across the tree of life. Nucleic Acids Res 2020; 47:D271-D279. [PMID: 30371900 PMCID: PMC6323951 DOI: 10.1093/nar/gky1009] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 10/10/2018] [Indexed: 11/23/2022] Open
Abstract
A growing number of whole genome sequencing projects, in combination with development of phylogenetic methods for reconstructing gene evolution, have provided us with a window into genomes that existed millions, and even billions, of years ago. Ancestral Genomes (http://ancestralgenomes.org) is a resource for comprehensive reconstructions of these ‘fossil genomes’. Comprehensive sets of protein-coding genes have been reconstructed for 78 genomes of now-extinct species that were the common ancestors of extant species from across the tree of life. The reconstructed genes are based on the extensive library of over 15 000 gene family trees from the PANTHER database, and are updated on a yearly basis. For each ancestral gene, we assign a stable identifier, and provide additional information designed to facilitate analysis: an inferred name, a reconstructed protein sequence, a set of inferred Gene Ontology (GO) annotations, and a ‘proxy gene’ for each ancestral gene, defined as the least-diverged descendant of the ancestral gene in a given extant genome. On the Ancestral Genomes website, users can browse the Ancestral Genomes by selecting nodes in a species tree, and can compare an extant genome with any of its reconstructed ancestors to understand how the genome evolved.
Collapse
Affiliation(s)
- Xiaosong Huang
- School of Life Sciences, Guangzhou University, Guangzhou 510006, China.,Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Laurent-Philippe Albou
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Tremayne Mushayahama
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Haiming Tang
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, Keck School of Medicine of USC, University of Southern California, Los Angeles, CA 90033, USA
| |
Collapse
|
45
|
Evolution of Predicted Acid Resistance Mechanisms in the Extremely Acidophilic Leptospirillum Genus. Genes (Basel) 2020; 11:genes11040389. [PMID: 32260256 PMCID: PMC7231039 DOI: 10.3390/genes11040389] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 03/02/2020] [Accepted: 03/04/2020] [Indexed: 02/01/2023] Open
Abstract
Organisms that thrive in extremely acidic environments (≤pH 3.5) are of widespread importance in industrial applications, environmental issues, and evolutionary studies. Leptospirillum spp. constitute the only extremely acidophilic microbes in the phylogenetically deep-rooted bacterial phylum Nitrospirae. Leptospirilli are Gram-negative, obligatory chemolithoautotrophic, aerobic, ferrous iron oxidizers. This paper predicts genes that Leptospirilli use to survive at low pH and infers their evolutionary trajectory. Phylogenetic and other bioinformatic approaches suggest that these genes can be classified into (i) "first line of defense", involved in the prevention of the entry of protons into the cell, and (ii) neutralization or expulsion of protons that enter the cell. The first line of defense includes potassium transporters, predicted to form an inside positive membrane potential, spermidines, hopanoids, and Slps (starvation-inducible outer membrane proteins). The "second line of defense" includes proton pumps and enzymes that consume protons. Maximum parsimony, clustering methods, and gene alignments are used to infer the evolutionary trajectory that potentially enabled the ancestral Leptospirillum to transition from a postulated circum-neutral pH environment to an extremely acidic one. The hypothesized trajectory includes gene gains/loss events driven extensively by horizontal gene transfer, gene duplications, gene mutations, and genomic rearrangements.
Collapse
|
46
|
Suvorov A, Hochuli J, Schrider DR. Accurate Inference of Tree Topologies from Multiple Sequence Alignments Using Deep Learning. Syst Biol 2020; 69:221-233. [PMID: 31504938 PMCID: PMC8204903 DOI: 10.1093/sysbio/syz060] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 08/28/2019] [Indexed: 11/13/2022] Open
Abstract
Reconstructing the phylogenetic relationships between species is one of the most formidable tasks in evolutionary biology. Multiple methods exist to reconstruct phylogenetic trees, each with their own strengths and weaknesses. Both simulation and empirical studies have identified several "zones" of parameter space where accuracy of some methods can plummet, even for four-taxon trees. Further, some methods can have undesirable statistical properties such as statistical inconsistency and/or the tendency to be positively misleading (i.e. assert strong support for the incorrect tree topology). Recently, deep learning techniques have made inroads on a number of both new and longstanding problems in biological research. In this study, we designed a deep convolutional neural network (CNN) to infer quartet topologies from multiple sequence alignments. This CNN can readily be trained to make inferences using both gapped and ungapped data. We show that our approach is highly accurate on simulated data, often outperforming traditional methods, and is remarkably robust to bias-inducing regions of parameter space such as the Felsenstein zone and the Farris zone. We also demonstrate that the confidence scores produced by our CNN can more accurately assess support for the chosen topology than bootstrap and posterior probability scores from traditional methods. Although numerous practical challenges remain, these findings suggest that the deep learning approaches such as ours have the potential to produce more accurate phylogenetic inferences.
Collapse
Affiliation(s)
- Anton Suvorov
- Department of Genetics, University of North Carolina at Chapel Hill, 120 Mason Farm Road, UNC-Chapel Hill, Chapel Hill, NC 27599-7264, USA
| | - Joshua Hochuli
- Biological and Biomedical Sciences Program, University of North Carolina at Chapel Hill, 130 Mason Farm Road, UNC-Chapel Hill Chapel Hill, NC 27599-7264, USA
| | - Daniel R Schrider
- Biological and Biomedical Sciences Program, University of North Carolina at Chapel Hill, 130 Mason Farm Road, UNC-Chapel Hill Chapel Hill, NC 27599-7264, USA
| |
Collapse
|
47
|
Hunnicutt KE, Tiley GP, Williams RC, Larsen PA, Blanco MB, Rasoloarison RM, Campbell CR, Zhu K, Weisrock DW, Matsunami H, Yoder AD. Comparative Genomic Analysis of the Pheromone Receptor Class 1 Family (V1R) Reveals Extreme Complexity in Mouse Lemurs (Genus, Microcebus) and a Chromosomal Hotspot across Mammals. Genome Biol Evol 2020; 12:3562-3579. [PMID: 31555816 PMCID: PMC6944220 DOI: 10.1093/gbe/evz200] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2019] [Indexed: 12/14/2022] Open
Abstract
Sensory gene families are of special interest for both what they can tell us about molecular evolution and what they imply as mediators of social communication. The vomeronasal type-1 receptors (V1Rs) have often been hypothesized as playing a fundamental role in driving or maintaining species boundaries given their likely function as mediators of intraspecific mate choice, particularly in nocturnal mammals. Here, we employ a comparative genomic approach for revealing patterns of V1R evolution within primates, with a special focus on the small-bodied nocturnal mouse and dwarf lemurs of Madagascar (genera Microcebus and Cheirogaleus, respectively). By doubling the existing genomic resources for strepsirrhine primates (i.e. the lemurs and lorises), we find that the highly speciose and morphologically cryptic mouse lemurs have experienced an elaborate proliferation of V1Rs that we argue is functionally related to their capacity for rapid lineage diversification. Contrary to a previous study that found equivalent degrees of V1R diversity in diurnal and nocturnal lemurs, our study finds a strong correlation between nocturnality and V1R elaboration, with nocturnal lemurs showing elaborate V1R repertoires and diurnal lemurs showing less diverse repertoires. Recognized subfamilies among V1Rs show unique signatures of diversifying positive selection, as might be expected if they have each evolved to respond to specific stimuli. Furthermore, a detailed syntenic comparison of mouse lemurs with mouse (genus Mus) and other mammalian outgroups shows that orthologous mammalian subfamilies, predicted to be of ancient origin, tend to cluster in a densely populated region across syntenic chromosomes that we refer to as a V1R "hotspot."
Collapse
Affiliation(s)
- Kelsie E Hunnicutt
- Department of Biology, Duke University, Durham, North Carolina
- Department of Biological Sciences, University of Denver, Denver, Colorado
| | - George P Tiley
- Department of Biology, Duke University, Durham, North Carolina
| | - Rachel C Williams
- Department of Biology, Duke University, Durham, North Carolina
- Duke Lemur Center, Duke University, Durham, North Carolina
| | - Peter A Larsen
- Department of Biology, Duke University, Durham, North Carolina
- Department of Veterinary and Biomedical Sciences, University of Minnesota, Saint Paul, Minnesota
| | | | - Rodin M Rasoloarison
- Behavioral Ecology and Sociobiology Unit, German Primate Centre, Göttingen, Germany
- Département de Biologie Animale, Université d’Antananarivo, Madagascar, Antananarivo, Madagascar
| | - C Ryan Campbell
- Department of Biology, Duke University, Durham, North Carolina
| | - Kevin Zhu
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina
| | - David W Weisrock
- Department of Biology, University of Kentucky, Lexington, Kentucky
| | - Hiroaki Matsunami
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina
- Department of Neurobiology, Duke Institute for Brain Sciences, Duke University Medical Center, Durham, North Carolina
| | - Anne D Yoder
- Department of Biology, Duke University, Durham, North Carolina
| |
Collapse
|
48
|
Trivedi R, Nagarajaram HA. Amino acid substitution scoring matrices specific to intrinsically disordered regions in proteins. Sci Rep 2019; 9:16380. [PMID: 31704957 PMCID: PMC6841959 DOI: 10.1038/s41598-019-52532-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 10/15/2019] [Indexed: 01/09/2023] Open
Abstract
An amino acid substitution scoring matrix encapsulates the rates at which various amino acid residues in proteins are substituted by other amino acid residues, over time. Database search methods make use of substitution scoring matrices to identify sequences with homologous relationships. However, widely used substitution scoring matrices, such as BLOSUM series, have been developed using aligned blocks that are mostly devoid of disordered regions in proteins. Hence, these substitution-scoring matrices are mostly inappropriate for homology searches involving proteins enriched with disordered regions as the disordered regions have distinct amino acid compositional bias, and therefore expected to have undergone amino acid substitutions that are distinct from those in the ordered regions. We, therefore, developed a novel series of substitution scoring matrices referred to as EDSSMat by exclusively considering the substitution frequencies of amino acids in the disordered regions of the eukaryotic proteins. The newly developed matrices were tested for their ability to detect homologs of proteins enriched with disordered regions by means of SSEARCH tool. The results unequivocally demonstrate that EDSSMat matrices detect more number of homologs than the widely used BLOSUM, PAM and other standard matrices, indicating their utility value for homology searches of intrinsically disordered proteins.
Collapse
Affiliation(s)
- Rakesh Trivedi
- Laboratory of Computational Biology, Centre for DNA Fingerprinting and Diagnostics, Uppal, Hyderabad, Telangana, 500039, India
- Graduate School, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
| | - Hampapathalu Adimurthy Nagarajaram
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, 500 046, India.
- Centre for Modelling, Simulation and Design, University of Hyderabad, Hyderabad, Telangana, 500 046, India.
| |
Collapse
|
49
|
Garcia AK, Kaçar B. How to resurrect ancestral proteins as proxies for ancient biogeochemistry. Free Radic Biol Med 2019; 140:260-269. [PMID: 30951835 DOI: 10.1016/j.freeradbiomed.2019.03.033] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 02/11/2019] [Accepted: 03/26/2019] [Indexed: 10/27/2022]
Abstract
Throughout the history of life, enzymes have served as the primary molecular mediators of biogeochemical cycles by catalyzing the metabolic pathways that interact with geochemical substrates. The byproducts of enzymatic activities have been preserved as chemical and isotopic signatures in the geologic record. However, interpretations of these signatures are limited by the assumption that such enzymes have remained functionally conserved over billions of years of molecular evolution. By reconstructing ancient genetic sequences in conjunction with laboratory enzyme resurrection, preserved biogeochemical signatures can instead be related to experimentally constrained, ancestral enzymatic properties. We may thereby investigate instances within molecular evolutionary trajectories potentially tied to significant biogeochemical transitions evidenced in the geologic record. Here, we survey recent enzyme resurrection studies to provide a reasoned assessment of areas of success and common pitfalls relevant to ancient biogeochemical applications. We conclude by considering the Great Oxidation Event, which provides a constructive example of a significant biogeochemical transition that warrants investigation with ancestral enzyme resurrection. This event also serves to highlight the pitfalls of facile interpretation of paleophenotype models and data, as applied to two examples of enzymes that likely both influenced and were influenced by the rise of atmospheric oxygen - RuBisCO and nitrogenase.
Collapse
Affiliation(s)
- Amanda K Garcia
- Department of Molecular and Cell Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Betül Kaçar
- Department of Molecular and Cell Biology, University of Arizona, Tucson, AZ, 85721, USA; Department of Astronomy and Steward Observatory, University of Arizona, Tucson, AZ, 85721, USA.
| |
Collapse
|
50
|
Six Impossible Things before Breakfast: Assumptions, Models, and Belief in Molecular Dating. Trends Ecol Evol 2019; 34:474-486. [PMID: 30904189 DOI: 10.1016/j.tree.2019.01.017] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 01/16/2023]
Abstract
Confidence in molecular dating analyses has grown with the increasing sophistication of the methods. Some problematic cases where molecular dates disagreed with paleontological estimates appear to have been resolved with a growing agreement between molecules and fossils. But we cannot relax just yet. The growing analytical sophistication of many molecular dating methods relies on an increasingly large number of assumptions about evolutionary history and processes. Many of these assumptions are based on statistical tractability rather than being informed by improved understanding of molecular evolution, yet changing the assumptions can influence molecular dates. How can we tell if the answers we get are driven more by the assumptions we make than by the molecular data being analyzed?
Collapse
|