1
|
de la Fuente R, Díaz-Villanueva W, Arnau V, Moya A. Genomic Signature in Evolutionary Biology: A Review. BIOLOGY 2023; 12:biology12020322. [PMID: 36829597 PMCID: PMC9953303 DOI: 10.3390/biology12020322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/19/2023]
Abstract
Organisms are unique physical entities in which information is stored and continuously processed. The digital nature of DNA sequences enables the construction of a dynamic information reservoir. However, the distinction between the hardware and software components in the information flow is crucial to identify the mechanisms generating specific genomic signatures. In this work, we perform a bibliometric analysis to identify the different purposes of looking for particular patterns in DNA sequences associated with a given phenotype. This study has enabled us to make a conceptual breakdown of the genomic signature and differentiate the leading applications. On the one hand, it refers to gene expression profiling associated with a biological function, which may be shared across taxa. This signature is the focus of study in precision medicine. On the other hand, it also refers to characteristic patterns in species-specific DNA sequences. This interpretation plays a key role in comparative genomics, identifying evolutionary relationships. Looking at the relevant studies in our bibliographic database, we highlight the main factors causing heterogeneities in genome composition and how they can be quantified. All these findings lead us to reformulate some questions relevant to evolutionary biology.
Collapse
Affiliation(s)
- Rebeca de la Fuente
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
- Correspondence:
| | - Wladimiro Díaz-Villanueva
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
| | - Vicente Arnau
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
| | - Andrés Moya
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
- Foundation for the Promotion of Sanitary and Biomedical Research of the Valencian Community (FISABIO), 46020 Valencia, Spain
- CIBER in Epidemiology and Public Health (CIBEResp), 28029 Madrid, Spain
| |
Collapse
|
2
|
Williams TA, Schrempf D, Szöllősi GJ, Cox CJ, Foster PG, Embley TM. Inferring the deep past from molecular data. Genome Biol Evol 2021; 13:6192802. [PMID: 33772552 PMCID: PMC8175050 DOI: 10.1093/gbe/evab067] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/22/2021] [Indexed: 12/17/2022] Open
Abstract
There is an expectation that analyses of molecular sequences might be able to distinguish between alternative hypotheses for ancient relationships, but the phylogenetic methods used and types of data analyzed are of critical importance in any attempt to recover historical signal. Here, we discuss some common issues that can influence the topology of trees obtained when using overly simple models to analyze molecular data that often display complicated patterns of sequence heterogeneity. To illustrate our discussion, we have used three examples of inferred relationships which have changed radically as models and methods of analysis have improved. In two of these examples, the sister-group relationship between thermophilic Thermus and mesophilic Deinococcus, and the position of long-branch Microsporidia among eukaryotes, we show that recovering what is now generally considered to be the correct tree is critically dependent on the fit between model and data. In the third example, the position of eukaryotes in the tree of life, the hypothesis that is currently supported by the best available methods is fundamentally different from the classical view of relationships between major cellular domains. Since heterogeneity appears to be pervasive and varied among all molecular sequence data, and even the best available models can still struggle to deal with some problems, the issues we discuss are generally relevant to phylogenetic analyses. It remains essential to maintain a critical attitude to all trees as hypotheses of relationship that may change with more data and better methods.
Collapse
Affiliation(s)
- Tom A Williams
- School of Biological Sciences, University of Bristol, Bristol BS8 1TQ, United Kingdom
| | - Dominik Schrempf
- Dept. of Biological Physics, Eötvös Loránd University, 1117 Budapest, Hungary
| | - Gergely J Szöllősi
- Dept. of Biological Physics, Eötvös Loránd University, 1117 Budapest, Hungary.,MTA-ELTE "Lendület" Evolutionary Genomics Research Group, 1117 Budapest, Hungary.,Institute of Evolution, Centre for Ecological Research, 1121 Budapest, Hungary
| | - Cymon J Cox
- Centro de Ciências do Mar, Universidade do Algarve, Gambelas, 8005-319 Faro, Portugal
| | - Peter G Foster
- Department of Life Sciences, Natural History Museum, London SW7 5BD, United Kingdom
| | - T Martin Embley
- Biosciences Institute, Centre for Bacterial Cell Biology, Newcastle University, Newcastle upon Tyne NE2 4AX, United Kingdom
| |
Collapse
|
3
|
Kuzminkova AA, Sokol AD, Ushakova KE, Popadin KY, Gunbin KV. mtProtEvol: the resource presenting molecular evolution analysis of proteins involved in the function of Vertebrate mitochondria. BMC Evol Biol 2019; 19:47. [PMID: 30813887 PMCID: PMC6391778 DOI: 10.1186/s12862-019-1371-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Heterotachy is the variation in the evolutionary rate of aligned sites in different parts of the phylogenetic tree. It occurs mainly due to epistatic interactions among the substitutions, which are highly complex and make it difficult to study protein evolution. The vast majority of computational evolutionary approaches for studying these epistatic interactions or their evolutionary consequences in proteins require high computational time. However, recently, it has been shown that the evolution of residue solvent accessibility (RSA) is tightly linked with changes in protein fitness and intra-protein epistatic interactions. This provides a computationally fast alternative, based on comparison of evolutionary rates of amino acid replacements with the rates of RSA evolutionary changes in order to recognize any shifts in epistatic interaction. RESULTS Based on RSA information, data randomization and phylogenetic approaches, we constructed a software pipeline, which can be used to analyze the evolutionary consequences of intra-protein epistatic interactions with relatively low computational time. We analyzed the evolution of 512 protein families tightly linked to mitochondrial function in Vertebrates and created "mtProtEvol", the web resource with data on protein evolution. In strict agreement with lifespan and metabolic rate data, we demonstrated that different functional categories of mitochondria-related proteins subjected to selection on accelerated and decelerated RSA rates in rodents and primates. For example, accelerated RSA evolution in rodents has been shown for Krebs cycle enzymes, respiratory chain and reactive oxygen species metabolism, while in primates these functions are stress-response, translation and mtDNA integrity. Decelerated RSA evolution in rodents has been demonstrated for translational machinery and oxidative stress response components. CONCLUSIONS mtProtEvol is an interactive resource focused on evolutionary analysis of epistatic interactions in protein families involved in Vertebrata mitochondria function and available at http://bioinfodbs.kantiana.ru/mtProtEvol /. This resource and the devised software pipeline may be useful tool for researchers in area of protein evolution.
Collapse
Affiliation(s)
- Anastasia A. Kuzminkova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Anastasia D. Sokol
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Kristina E. Ushakova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Konstantin Yu. Popadin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Konstantin V. Gunbin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center of Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
4
|
Moyers BA, Zhang J. Toward Reducing Phylostratigraphic Errors and Biases. Genome Biol Evol 2018; 10:2037-2048. [PMID: 30060201 PMCID: PMC6105108 DOI: 10.1093/gbe/evy161] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/28/2018] [Indexed: 01/03/2023] Open
Abstract
Phylostratigraphy is a method for estimating gene age, usually applied to large numbers of genes in order to detect nonrandom age-distributions of gene properties that could shed light on mechanisms of gene origination and evolution. However, phylostratigraphy underestimates gene age with a nonnegligible probability. The underestimation is severer for genes with certain properties, creating spurious age distributions of these properties and those correlated with these properties. Here we explore three strategies to reduce phylostratigraphic error/bias. First, we test several alternative homology detection methods (PSIBLAST, HMMER, PHMMER, OMA, and GLAM2Scan) in phylostratigraphy, but fail to find any that noticeably outperforms the commonly used BLASTP. Second, using machine learning, we look for predictors of error-prone genes to exclude from phylostratigraphy, but cannot identify reliable predictors. Finally, we remove from phylostratigraphic analysis genes exhibiting errors in simulation, which by definition minimizes error/bias if the simulation is sufficiently realistic. Using this last approach, we show that some previously reported phylostratigraphic trends (e.g., younger proteins tend to evolve more rapidly and be shorter) disappear or even reverse, reconfirming the necessity of controlling phylostratigraphic error/bias. Taken together, our analyses demonstrate that phylostratigraphic errors/biases are refractory to several potential solutions but can be controlled at least partially by the exclusion of error-prone genes identified via realistic simulations. These results are expected to stimulate the judicious use of error-aware phylostratigraphy and reevaluation of previous phylostratigraphic findings.
Collapse
Affiliation(s)
- Bryan A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
5
|
Abstract
Phylostratigraphy, originally designed for gene age estimation by BLAST-based protein homology searches of sequenced genomes, has been widely used for studying patterns and inferring mechanisms of gene origination and evolution. We previously showed by computer simulation that phylostratigraphy underestimates gene age for a nonnegligible fraction of genes and that the underestimation is severer for genes with certain properties such as fast evolution and short protein sequences. Consequently, many previously reported age distributions of gene properties may have been methodological artifacts rather than biological realities. Domazet-Lošo and colleagues recently argued that our simulations were flawed and that phylostratigraphic bias does not impact inferences about gene emergence and evolution. Here we discuss conceptual difficulties of phylostratigraphy, identify numerous problems in Domazet-Lošo et al.’s argument, reconfirm phylostratigraphic error using simulations suggested by Domazet-Lošo and colleagues, and demonstrate that a phylostratigraphic trend claimed to be robust to error disappears when genes likely to be error-resistant are analyzed. We conclude that extreme caution is needed in interpreting phylostratigraphic results because of the inherent biases of the method and that reanalysis using genes exhibiting no error in realistic simulations may help reduce spurious findings.
Collapse
Affiliation(s)
- Bryan A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan
| |
Collapse
|
6
|
Gaur U, Tu J, Li D, Gao Y, Lian T, Sun B, Yang D, Fan X, Yang M. Molecular evolutionary patterns of NAD+/Sirtuin aging signaling pathway across taxa. PLoS One 2017; 12:e0182306. [PMID: 28767699 PMCID: PMC5540417 DOI: 10.1371/journal.pone.0182306] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2017] [Accepted: 07/16/2017] [Indexed: 12/12/2022] Open
Abstract
A deeper understanding of the conserved molecular mechanisms in different taxa have been made possible only because of the evolutionary conservation of crucial signaling pathways. In the present study, we explored the molecular evolutionary pattern of selection signatures in 51 species for 10 genes which are important components of NAD+/Sirtuin pathway and have already been directly linked to lifespan extension in worms and mice. Selection pressure analysis using PAML program revealed that MRPS5 and PPARGC1A were under significant constraints because of their functional significance. FOXO3a also displayed strong purifying selection. All three sirtuins, which were SIRT1, SIRT2 and SIRT6, displayed a great degree of conservation between taxa, which is consistent with the previous report. A significant evolutionary constraint is seen on the anti-oxidant gene, SOD3. As expected, TP53 gene was under significant selection pressure in mammals, owing to its major role in tumor progression. Poly-ADP-ribose polymerase (PARP) genes displayed the most sites under positive selection. Further 3D structural analysis of PARP1 and PARP2 protein revealed that some of these positively selected sites caused a change in the electrostatic potential of the protein structure, which may allow a change in its interaction with other proteins and molecules ultimately leading to difference in the function. Although the functional significance of the positively selected sites could not be established in the variants databases, yet it will be interesting to see if these sites actually affect the function of PARP1 and PARP2.
Collapse
Affiliation(s)
- Uma Gaur
- Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, Sichuan, P. R. China
| | - Jianbo Tu
- Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, Sichuan, P. R. China
| | - Diyan Li
- Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, Sichuan, P. R. China
| | - Yue Gao
- Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, Sichuan, P. R. China
| | - Ting Lian
- Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, Sichuan, P. R. China
| | - Boyuan Sun
- Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, Sichuan, P. R. China
| | - Deying Yang
- Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, Sichuan, P. R. China
| | - Xiaolan Fan
- Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, Sichuan, P. R. China
| | - Mingyao Yang
- Institute of Animal Genetics and Breeding, Sichuan Agricultural University, Chengdu, Sichuan, P. R. China
| |
Collapse
|
7
|
Domazet-Lošo T, Carvunis AR, Albà MM, Šestak MS, Bakaric R, Neme R, Tautz D. No Evidence for Phylostratigraphic Bias Impacting Inferences on Patterns of Gene Emergence and Evolution. Mol Biol Evol 2017; 34:843-856. [PMID: 28087778 PMCID: PMC5400388 DOI: 10.1093/molbev/msw284] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Phylostratigraphy is a computational framework for dating the emergence of DNA and protein sequences in a phylogeny. It has been extensively applied to make inferences on patterns of genome evolution, including patterns of disease gene evolution, ontogeny and de novo gene origination. Phylostratigraphy typically relies on BLAST searches along a species tree, but new simulation studies have raised concerns about the ability of BLAST to detect remote homologues and its impact on phylostratigraphic inferences. Here, we re-assessed these simulations. We found that, even with a possible overall BLAST false negative rate between 11–15%, the large majority of sequences assigned to a recent evolutionary origin by phylostratigraphy is unaffected by technical concerns about BLAST. Where the results of the simulations did cast doubt on previously reported findings, we repeated the original analyses but now excluded all questionable sequences. The originally described patterns remained essentially unchanged. These new analyses strongly support phylostratigraphic inferences, including: genes that emerged after the origin of eukaryotes are more likely to be expressed in the ectoderm than in the endoderm or mesoderm in Drosophila, and the de novo emergence of protein-coding genes from non-genic sequences occurs through proto-gene intermediates in yeast. We conclude that BLAST is an appropriate and sufficiently sensitive tool in phylostratigraphic analysis that does not appear to introduce significant biases into evolutionary pattern inferences.
Collapse
Affiliation(s)
- Tomislav Domazet-Lošo
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruđer Bošković Institute, Zagreb, Croatia.,Catholic University of Croatia, Zagreb, Croatia
| | | | - M Mar Albà
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics, Hospital del Mar Research Institute, Universitat Pompeu Fabra, Barcelona, Spain.,Catalan Institution for Research and Advanced Studies, Barcelona, Spain
| | - Martin Sebastijan Šestak
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruder Boškovic Institute, Zagreb, Croatia
| | - Robert Bakaric
- Laboratory of Evolutionary Genetics, Division of Molecular Biology, Ruder Boškovic Institute, Zagreb, Croatia
| | - Rafik Neme
- Max-Planck Institute for Evolutionary Biology, Plön, Germany
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, Plön, Germany
| |
Collapse
|
8
|
Schwentner M, Combosch DJ, Pakes Nelson J, Giribet G. A Phylogenomic Solution to the Origin of Insects by Resolving Crustacean-Hexapod Relationships. Curr Biol 2017; 27:1818-1824.e5. [DOI: 10.1016/j.cub.2017.05.040] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Revised: 04/10/2017] [Accepted: 05/10/2017] [Indexed: 12/11/2022]
|
9
|
Abstract
Most phylogenetic methods are model-based and depend on models of evolution designed to approximate the evolutionary processes. Several methods have been developed to identify suitable models of evolution for phylogenetic analysis of alignments of nucleotide or amino acid sequences and some of these methods are now firmly embedded in the phylogenetic protocol. However, in a disturbingly large number of cases, it appears that these models were used without acknowledgement of their inherent shortcomings. In this chapter, we discuss the problem of model selection and show how some of the inherent shortcomings may be identified and overcome.
Collapse
Affiliation(s)
| | - Vivek Jayaswal
- School of Biomedical Sciences, Queensland University of Technology, Brisbane, QLD, Australia
| | - Faisal M Ababneh
- Department of Mathematics & Statistics, Al-Hussein Bin Talal University, Ma'an, Jordan
| | - John Robinson
- School of Mathematics & Statistics, University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
10
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
11
|
Zhao S, Burki F, Bråte J, Keeling PJ, Klaveness D, Shalchian-Tabrizi K. Collodictyon--an ancient lineage in the tree of eukaryotes. Mol Biol Evol 2012; 29:1557-68. [PMID: 22319147 PMCID: PMC3351787 DOI: 10.1093/molbev/mss001] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The current consensus for the eukaryote tree of life consists of several large assemblages (supergroups) that are hypothesized to describe the existing diversity. Phylogenomic analyses have shed light on the evolutionary relationships within and between supergroups as well as placed newly sequenced enigmatic species close to known lineages. Yet, a few eukaryote species remain of unknown origin and could represent key evolutionary forms for inferring ancient genomic and cellular characteristics of eukaryotes. Here, we investigate the evolutionary origin of the poorly studied protist Collodictyon (subphylum Diphyllatia) by sequencing a cDNA library as well as the 18S and 28S ribosomal DNA (rDNA) genes. Phylogenomic trees inferred from 124 genes placed Collodictyon close to the bifurcation of the “unikont” and “bikont” groups, either alone or as sister to the potentially contentious excavate Malawimonas. Phylogenies based on rDNA genes confirmed that Collodictyon is closely related to another genus, Diphylleia, and revealed a very low diversity in environmental DNA samples. The early and distinct origin of Collodictyon suggests that it constitutes a new lineage in the global eukaryote phylogeny. Collodictyon shares cellular characteristics with Excavata and Amoebozoa, such as ventral feeding groove supported by microtubular structures and the ability to form thin and broad pseudopods. These may therefore be ancient morphological features among eukaryotes. Overall, this shows that Collodictyon is a key lineage to understand early eukaryote evolution.
Collapse
Affiliation(s)
- Sen Zhao
- Microbial Evolution Research Group, Department of Biology, University of Oslo, Oslo, Norway
| | | | | | | | | | | |
Collapse
|
12
|
Wang HC, Susko E, Roger AJ. Fast statistical tests for detecting heterotachy in protein evolution. Mol Biol Evol 2011; 28:2305-15. [PMID: 21343603 DOI: 10.1093/molbev/msr050] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The w statistic introduced by Lockhart et al. (1998. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol Biol Evol. 15:1183-1188) is a simple and easily calculated statistic intended to detect heterotachy by comparing amino acid substitution patterns between two monophyletic groups of protein sequences. It is defined as the difference between the fraction of varied sites in both groups and the fraction of varied sites in each group. The w test has been used to distinguish a covarion process from equal rates and rates variation across sites processes. Using simulation we show that the w test is effective for small data sets and for data sets that have low substitution rates in the groups but can have difficulties when these conditions are not met. Using site entropy as a measure of variability of a sequence site, we modify the w statistic to a w' statistic by assigning as varied in one group those sites that are actually varied in both groups but have a large entropy difference. We show that the w' test has more power to detect two kinds of heterotachy processes (covarion and bivariate rate shifts) in large and variable data. We also show that a test of Pearson's correlation of the site entropies between two monophyletic groups can be used to detect heterotachy and has more power than the w' test. Furthermore, we demonstrate that there are settings where the correlation test as well as w and w' tests do not detect heterotachy signals in data simulated under a branch length mixture model. In such cases, it is sometimes possible to detect heterotachy through subselection of appropriate taxa. Finally, we discuss the abilities of the three statistical tests to detect a fourth mode of heterotachy: lineage-specific changes in proportion of variable sites.
Collapse
Affiliation(s)
- Huai-Chun Wang
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.
| | | | | |
Collapse
|
13
|
Lunzer M, Golding GB, Dean AM. Pervasive cryptic epistasis in molecular evolution. PLoS Genet 2010; 6:e1001162. [PMID: 20975933 PMCID: PMC2958800 DOI: 10.1371/journal.pgen.1001162] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2010] [Accepted: 09/16/2010] [Indexed: 11/19/2022] Open
Abstract
The functional effects of most amino acid replacements accumulated during molecular evolution are unknown, because most are not observed naturally and the possible combinations are too numerous. We created 168 single mutations in wild-type Escherichia coli isopropymalate dehydrogenase (IMDH) that match the differences found in wild-type Pseudomonas aeruginosa IMDH. 104 mutant enzymes performed similarly to E. coli wild-type IMDH, one was functionally enhanced, and 63 were functionally compromised. The transition from E. coli IMDH, or an ancestral form, to the functional wild-type P. aeruginosa IMDH requires extensive epistasis to ameliorate the combined effects of the deleterious mutations. This result stands in marked contrast with a basic assumption of molecular phylogenetics, that sites in sequences evolve independently of each other. Residues that affect function are scattered haphazardly throughout the IMDH structure. We screened for compensatory mutations at three sites, all of which lie near the active site and all of which are among the least active mutants. No compensatory mutations were found at two sites indicating that a single site may engage in compound epistatic interactions. One complete and three partial compensatory mutations of the third site are remote and lie in a different domain. This demonstrates that epistatic interactions can occur between distant (>20Å) sites. Phylogenetic analysis shows that incompatible mutations were fixed in different lineages.
Collapse
Affiliation(s)
- Mark Lunzer
- BioTechnology Institute, University of Minnesota, St. Paul, Minnesota, United States of America
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - Antony M. Dean
- BioTechnology Institute, University of Minnesota, St. Paul, Minnesota, United States of America
- Department of Ecology, Evolution and Behavior, University of Minnesota, St. Paul, Minnesota, United States of America
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
| |
Collapse
|
14
|
Studer RA, Robinson-Rechavi M. Large-scale analysis of orthologs and paralogs under covarion-like and constant-but-different models of amino acid evolution. Mol Biol Evol 2010; 27:2618-27. [PMID: 20551039 PMCID: PMC2955734 DOI: 10.1093/molbev/msq149] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Functional divergence between homologous proteins is expected to affect amino acid sequences in two main ways, which can be considered as proxies of biochemical divergence: a “covarion-like” pattern of correlated changes in evolutionary rates, and switches in conserved residues (“conserved but different”). Although these patterns have been used in case studies, a large-scale analysis is needed to estimate their frequency and distribution. We use a phylogenomic framework of animal genes to answer three questions: 1) What is the prevalence of such patterns? 2) Can we link such patterns at the amino acid level with selection inferred at the codon level? 3) Are patterns different between paralogs and orthologs? We find that covarion-like patterns are more frequently detected than “constant but different,” but that only the latter are correlated with signal for positive selection. Finally, there is no obvious difference in patterns between orthologs and paralogs.
Collapse
Affiliation(s)
- Romain A Studer
- Department of Ecology and Evolution, Biophore, University of Lausanne, Lausanne, Switzerland
| | | |
Collapse
|