1
|
Weber CC, Perron U, Casey D, Yang Z, Goldman N. Ambiguity Coding Allows Accurate Inference of Evolutionary Parameters from Alignments in an Aggregated State-Space. Syst Biol 2020; 70:21-32. [PMID: 32353118 PMCID: PMC7744038 DOI: 10.1093/sysbio/syaa036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Revised: 03/20/2020] [Accepted: 03/30/2020] [Indexed: 11/14/2022] Open
Abstract
How can we best learn the history of a protein’s evolution? Ideally, a model of sequence evolution should capture both the process that generates genetic variation and the functional constraints determining which changes are fixed. However, in practical terms the most suitable approach may simply be the one that combines the convenience of easily available input data with the ability to return useful parameter estimates. For example, we might be interested in a measure of the strength of selection (typically obtained using a codon model) or an ancestral structure (obtained using structural modeling based on inferred amino acid sequence and side chain configuration). But what if data in the relevant state-space are not readily available? We show that it is possible to obtain accurate estimates of the outputs of interest using an established method for handling missing data. Encoding observed characters in an alignment as ambiguous representations of characters in a larger state-space allows the application of models with the desired features to data that lack the resolution that is normally required. This strategy is viable because the evolutionary path taken through the observed space contains information about states that were likely visited in the “unseen” state-space. To illustrate this, we consider two examples with amino acid sequences as input. We show that \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$$\omega$$\end{document}, a parameter describing the relative strength of selection on nonsynonymous and synonymous changes, can be estimated in an unbiased manner using an adapted version of a standard 61-state codon model. Using simulated and empirical data, we find that ancestral amino acid side chain configuration can be inferred by applying a 55-state empirical model to 20-state amino acid data. Where feasible, combining inputs from both ambiguity-coded and fully resolved data improves accuracy. Adding structural information to as few as 12.5% of the sequences in an amino acid alignment results in remarkable ancestral reconstruction performance compared to a benchmark that considers the full rotamer state information. These examples show that our methods permit the recovery of evolutionary information from sequences where it has previously been inaccessible. [Ancestral reconstruction; natural selection; protein structure; state-spaces; substitution models.]
Collapse
Affiliation(s)
- Claudia C Weber
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Umberto Perron
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Dearbhaile Casey
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ziheng Yang
- Department of Genetics, University College London, London WC1E 6BT, UK
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
2
|
Abstract
Substitutions between chemically distant amino acids are known to occur less frequently than those between more similar amino acids. This knowledge, however, is not reflected in most codon substitution models, which treat all nonsynonymous changes as if they were equivalent in terms of impact on the protein. A variety of methods for integrating chemical distances into models have been proposed, with a common approach being to divide substitutions into radical or conservative categories. Nevertheless, it remains unclear whether the resulting models describe sequence evolution better than their simpler counterparts. We propose a parametric codon model that distinguishes between radical and conservative substitutions, allowing us to assess if radical substitutions are preferentially removed by selection. Applying our new model to a range of phylogenomic data, we find differentiating between radical and conservative substitutions provides significantly better fit for large populations, but see no equivalent improvement for smaller populations. Comparing codon and amino acid models using these same data shows that alignments from large populations tend to select phylogenetic models containing information about amino acid exchangeabilities, whereas the structure of the genetic code is more important for smaller populations. Our results suggest selection against radical substitutions is, on average, more pronounced in large populations than smaller ones. The reduced observable effect of selection in smaller populations may be due to stronger genetic drift making it more challenging to detect preferences. Our results imply an important connection between the life history of a phylogenetic group and the model that best describes its evolution.
Collapse
Affiliation(s)
- Claudia C Weber
- Center for Computational Genetics and Genomics, Department of Biology, Temple University, Philadelphia, PA.,European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Simon Whelan
- Evolutionary Biology Center, Uppsala University, Uppsala, Sweden
| |
Collapse
|
3
|
Arenas M, Weber CC, Liberles DA, Bastolla U. ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability. Syst Biol 2018; 66:1054-1064. [PMID: 28057858 DOI: 10.1093/sysbio/syw121] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 12/28/2016] [Indexed: 11/12/2022] Open
Abstract
The computational reconstruction of ancestral proteins provides information on past biological events and has practical implications for biomedicine and biotechnology. Currently available tools for ancestral sequence reconstruction (ASR) are often based on empirical amino acid substitution models that assume that all sites evolve at the same rate and under the same process. However, this assumption is frequently violated because protein evolution is highly heterogeneous due to different selective constraints among sites. Here, we present ProtASR, a new evolutionary framework to infer ancestral protein sequences accounting for selection on protein stability. First, ProtASR generates site-specific substitution matrices through the structurally constrained mean-field (MF) substitution model, which considers both unfolding and misfolding stability. We previously showed that MF models outperform empirical amino acid substitution models, as well as other structurally constrained substitution models, both in terms of likelihood and correctly inferring amino acid distributions across sites. In the second step, ProtASR adapts a well-established maximum-likelihood (ML) ASR procedure to infer ancestral proteins under MF models. A known bias of ML ASR methods is that they tend to overestimate the stability of ancestral proteins by underestimating the frequency of deleterious mutations. We compared ProtASR under MF to two empirical substitution models (JTT and CAT), reconstructing the ancestral sequences of simulated proteins. ProtASR yields reconstructed proteins with less biased stabilities, which are significantly closer to those of the simulated proteins. Analysis of extant protein families suggests that folding stability evolves through time across protein families, potentially reflecting neutral fluctuation. Some families exhibit a more constant protein folding stability, while others are more variable. ProtASR is freely available from https://github.com/miguelarenas/protasr and includes detailed documentation and ready-to-use examples. It runs in seconds/minutes depending on protein length and alignment size. [Ancestral sequence reconstruction; folding stability; molecular adaptation; phylogenetics; protein evolution; protein structure.].
Collapse
Affiliation(s)
- Miguel Arenas
- Instituto de Investigação e Inovação em Saúde (i3S), University of Porto, Porto, Portugal.,Institute of Molecular Pathology and Immunology of the University of Porto (IPATIMUP), Porto, Portugal.,Centre for Molecular Biology Severo Ochoa (CBMSO), Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain
| | - Claudia C Weber
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA 19122, USA
| | - David A Liberles
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA.,Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA 19122, USA
| | - Ugo Bastolla
- Centre for Molecular Biology Severo Ochoa (CBMSO), Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| |
Collapse
|
4
|
Abstract
That population size affects the fate of new mutations arising in genomes, modulating both how frequently they arise and how efficiently natural selection is able to filter them, is well established. It is therefore clear that these distinct roles for population size that characterize different processes should affect the evolution of proteins and need to be carefully defined. Empirical evidence is consistent with a role for demography in influencing protein evolution, supporting the idea that functional constraints alone do not determine the composition of coding sequences. Given that the relationship between population size, mutant fitness and fixation probability has been well characterized, estimating fitness from observed substitutions is well within reach with well-formulated models. Molecular evolution research has, therefore, increasingly begun to leverage concepts from population genetics to quantify the selective effects associated with different classes of mutation. However, in order for this type of analysis to provide meaningful information about the intra- and inter-specific evolution of coding sequences, a clear definition of concepts of population size, what they influence, and how they are best parameterized is essential. Here, we present an overview of the many distinct concepts that “population size” and “effective population size” may refer to, what they represent for studying proteins, and how this knowledge can be harnessed to produce better specified models of protein evolution.
Collapse
Affiliation(s)
- Alexander Platt
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA
| | - Claudia C Weber
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA.
| |
Collapse
|
5
|
Chi PB, Kim D, Lai JK, Bykova N, Weber CC, Kubelka J, Liberles DA. A new parameter-rich structure-aware mechanistic model for amino acid substitution during evolution. Proteins 2017; 86:218-228. [PMID: 29178386 DOI: 10.1002/prot.25429] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Revised: 11/14/2017] [Accepted: 11/22/2017] [Indexed: 02/06/2023]
Abstract
Improvements in the description of amino acid substitution are required to develop better pseudo-energy-based protein structure-aware models for use in phylogenetic studies. These models are used to characterize the probabilities of amino acid substitution and enable better simulation of protein sequences over a phylogeny. A better characterization of amino acid substitution probabilities in turn enables numerous downstream applications, like detecting positive selection, ancestral sequence reconstruction, and evolutionarily-motivated protein engineering. Many existing Markov models for amino acid substitution in molecular evolution disregard molecular structure and describe the amino acid substitution process over longer evolutionary periods poorly. Here, we present a new model upgraded with a site-specific parameterization of pseudo-energy terms in a coarse-grained force field, which describes local heterogeneity in physical constraints on amino acid substitution better than a previous pseudo-energy-based model with minimum cost in runtime. The importance of each weight term parameterization in characterizing underlying features of the site, including contact number, solvent accessibility, and secondary structural elements was evaluated, returning both expected and biologically reasonable relationships between model parameters. This results in the acceptance of proposed amino acid substitutions that more closely resemble those observed site-specific frequencies in gene family alignments. The modular site-specific pseudo-energy function is made available for download through the following website: https://liberles.cst.temple.edu/Software/CASS/index.html.
Collapse
Affiliation(s)
- Peter B Chi
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122.,Department of Mathematics and Computer Science, Ursinus College, Collegeville, Pennsylvania, 19426
| | - Dohyup Kim
- Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071
| | - Jason K Lai
- Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071
| | - Nadia Bykova
- Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071.,Faculty of Bioengineering and Bioinformatics, Moscow State University, Moscow, 119234, Russia
| | - Claudia C Weber
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
| | - Jan Kubelka
- Department of Chemistry, University of Wyoming, Laramie, Wyoming, 82071
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122.,Department of Molecular Biology, University of Wyoming, Laramie, Wyoming, 82071
| |
Collapse
|
6
|
Daud NMAN, Bakis E, Hallett JP, Weber CC, Welton T. Evidence for the spontaneous formation of N-heterocyclic carbenes in imidazolium based ionic liquids. Chem Commun (Camb) 2017; 53:11154-11156. [PMID: 28890962 DOI: 10.1039/c7cc06112a] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
We present a study of the reactions of aldehydes in ionic liquids which gives evidence for the spontaneous formation of N-heterocyclic carbenes in ionic liquids based on 1,3-dialkyl substituted imidazolium cations from the lack of a deuterium isotope effect on the reaction of these ionic liquids with aldehydes.
Collapse
Affiliation(s)
- N M A N Daud
- Department of Chemistry, Imperial College London, London, SW7 2AZ, UK.
| | | | | | | | | |
Collapse
|
7
|
Weber CC, Kunov-Kruse AJ, Rogers RD, Myerson AS. Manipulation of ionic liquid anion-solute-antisolvent interactions for the purification of acetaminophen. Chem Commun (Camb) 2015; 51:4294-7. [PMID: 25673089 DOI: 10.1039/c5cc00198f] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Hydrogen bond donating cosolvents have been shown to significantly reduce the solubility of acetaminophen (AAP) in ionic liquids containing the acetate anion. Reduced solubility arises from competition for solvation by the acetate anion and can be used for the design of advanced separation techniques, illustrated by the crystallization of AAP.
Collapse
Affiliation(s)
- C C Weber
- Novartis-MIT Center for Continuous Manufacturing and Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
| | | | | | | |
Collapse
|
8
|
Mugal CF, Weber CC, Ellegren H. GC-biased gene conversion links the recombination landscape and demography to genomic base composition. Bioessays 2015; 37:1317-26. [DOI: 10.1002/bies.201500058] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Carina F. Mugal
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| | - Claudia C. Weber
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
- Department of Biology; Center for Computational Genetics and Genomics; Temple University; Philadelphia PA USA
| | - Hans Ellegren
- Department of Evolutionary Biology; Evolutionary Biology Centre; Uppsala University; Uppsala Sweden
| |
Collapse
|
9
|
Weber CC, Nabholz B, Romiguier J, Ellegren H. Kr/Kc but not dN/dS correlates positively with body mass in birds, raising implications for inferring lineage-specific selection. Genome Biol 2015; 15:542. [PMID: 25607475 PMCID: PMC4264323 DOI: 10.1186/s13059-014-0542-8] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2014] [Accepted: 11/13/2014] [Indexed: 02/02/2023] Open
Abstract
Background The ratio of the rates of non-synonymous and synonymous substitution (dN/dS) is commonly used to estimate selection in coding sequences. It is often suggested that, all else being equal, dN/dS should be lower in populations with large effective size (Ne) due to increased efficacy of purifying selection. As Ne is difficult to measure directly, life history traits such as body mass, which is typically negatively associated with population size, have commonly been used as proxies in empirical tests of this hypothesis. However, evidence of whether the expected positive correlation between body mass and dN/dS is consistently observed is conflicting. Results Employing whole genome sequence data from 48 avian species, we assess the relationship between rates of molecular evolution and life history in birds. We find a negative correlation between dN/dS and body mass, contrary to nearly neutral expectation. This raises the question whether the correlation might be a method artefact. We therefore in turn consider non-stationary base composition, divergence time and saturation as possible explanations, but find no clear patterns. However, in striking contrast to dN/dS, the ratio of radical to conservative amino acid substitutions (Kr/Kc) correlates positively with body mass. Conclusions Our results in principle accord with the notion that non-synonymous substitutions causing radical amino acid changes are more efficiently removed by selection in large populations, consistent with nearly neutral theory. These findings have implications for the use of dN/dS and suggest that caution is warranted when drawing conclusions about lineage-specific modes of protein evolution using this metric. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0542-8) contains supplementary material, which is available to authorized users.
Collapse
|
10
|
Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SYW, Faircloth BC, Nabholz B, Howard JT, Suh A, Weber CC, da Fonseca RR, Alfaro-Núñez A, Narula N, Liu L, Burt D, Ellegren H, Edwards SV, Stamatakis A, Mindell DP, Cracraft J, Braun EL, Warnow T, Jun W, Gilbert MTP, Zhang G. Phylogenomic analyses data of the avian phylogenomics project. Gigascience 2015; 4:4. [PMID: 25741440 PMCID: PMC4349222 DOI: 10.1186/s13742-014-0038-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2014] [Accepted: 12/16/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. FINDINGS Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. CONCLUSIONS The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
Collapse
Affiliation(s)
- Erich D Jarvis
- Department of Neurobiology, Howard Hughes Medical Institute and Duke University Medical Center, Durham, NC 27710 USA
| | - Siavash Mirarab
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712 USA
| | - Andre J Aberer
- Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Bo Li
- China National GeneBank, BGI-Shenzhen, Shenzhen, 518083 China ; College of Medicine and Forensics, Xi'an Jiaotong University, Xi'an, 710061 China ; Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Peter Houde
- Department of Biology, New Mexico State University, Las Cruces, NM 88003 USA
| | - Cai Li
- China National GeneBank, BGI-Shenzhen, Shenzhen, 518083 China ; Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Simon Y W Ho
- School of Biological Sciences, University of Sydney, Sydney, NSW 2006 Australia
| | - Brant C Faircloth
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, CA 90095 USA ; Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803 USA
| | - Benoit Nabholz
- CNRS UMR 5554, Institut des Sciences de l'Evolution de Montpellier, Université Montpellier II, Montpellier, France
| | - Jason T Howard
- Department of Neurobiology, Howard Hughes Medical Institute and Duke University Medical Center, Durham, NC 27710 USA
| | - Alexander Suh
- Department of Evolutionary Biology, Uppsala University, SE-752 36 Uppsala, Sweden
| | - Claudia C Weber
- Department of Evolutionary Biology, Uppsala University, SE-752 36 Uppsala, Sweden
| | - Rute R da Fonseca
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Alonzo Alfaro-Núñez
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Nitish Narula
- Department of Biology, New Mexico State University, Las Cruces, NM 88003 USA ; Biodiversity and Biocomplexity Unit, Okinawa Institute of Science and Technology Onna-son, Okinawa, 904-0495 Japan
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, 30602 USA
| | - Dave Burt
- Department of Genomics and Genetics, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG UK
| | - Hans Ellegren
- Department of Evolutionary Biology, Uppsala University, SE-752 36 Uppsala, Sweden
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, MA USA
| | - Alexandros Stamatakis
- Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany ; Institute of Theoretical Informatics, Department of Informatics, Karlsruhe Institute of Technology, D- 76131 Karlsruhe, Germany
| | - David P Mindell
- Department of Biochemistry & Biophysics, University of California, San Francisco, CA 94158 USA
| | - Joel Cracraft
- Department of Ornithology, American Museum of Natural History, New York, NY 10024 USA
| | - Edward L Braun
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611 USA
| | - Tandy Warnow
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712 USA
| | - Wang Jun
- China National GeneBank, BGI-Shenzhen, Shenzhen, 518083 China ; Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark ; Princess Al Jawhara Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah, 21589 Saudi Arabia ; Macau University of Science and Technology, Avenida Wai long, Taipa, Macau, 999078 China ; Department of Medicine, University of Hong Kong, Hong Kong, Hong Kong
| | - M Thomas Pius Gilbert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark ; Trace and Environmental DNA Laboratory Department of Environment and Agriculture, Curtin University, Perth, WA 6102 Australia
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen, 518083 China ; Centre for Social Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, DK-2100 Copenhagen, Denmark
| | | |
Collapse
|
11
|
Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SYW, Faircloth BC, Nabholz B, Howard JT, Suh A, Weber CC, da Fonseca RR, Li J, Zhang F, Li H, Zhou L, Narula N, Liu L, Ganapathy G, Boussau B, Bayzid MS, Zavidovych V, Subramanian S, Gabaldón T, Capella-Gutiérrez S, Huerta-Cepas J, Rekepalli B, Munch K, Schierup M, Lindow B, Warren WC, Ray D, Green RE, Bruford MW, Zhan X, Dixon A, Li S, Li N, Huang Y, Derryberry EP, Bertelsen MF, Sheldon FH, Brumfield RT, Mello CV, Lovell PV, Wirthlin M, Schneider MPC, Prosdocimi F, Samaniego JA, Vargas Velazquez AM, Alfaro-Núñez A, Campos PF, Petersen B, Sicheritz-Ponten T, Pas A, Bailey T, Scofield P, Bunce M, Lambert DM, Zhou Q, Perelman P, Driskell AC, Shapiro B, Xiong Z, Zeng Y, Liu S, Li Z, Liu B, Wu K, Xiao J, Yinqi X, Zheng Q, Zhang Y, Yang H, Wang J, Smeds L, Rheindt FE, Braun M, Fjeldsa J, Orlando L, Barker FK, Jønsson KA, Johnson W, Koepfli KP, O'Brien S, Haussler D, Ryder OA, Rahbek C, Willerslev E, Graves GR, Glenn TC, McCormack J, Burt D, Ellegren H, Alström P, Edwards SV, Stamatakis A, Mindell DP, Cracraft J, Braun EL, Warnow T, Jun W, Gilbert MTP, Zhang G. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 2014; 346:1320-31. [PMID: 25504713 PMCID: PMC4405904 DOI: 10.1126/science.1253451] [Citation(s) in RCA: 1095] [Impact Index Per Article: 109.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.
Collapse
Affiliation(s)
- Erich D Jarvis
- Department of Neurobiology, Howard Hughes Medical Institute (HHMI), and Duke University Medical Center, Durham, NC 27710, USA.
| | - Siavash Mirarab
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA
| | - Andre J Aberer
- Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Bo Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China. College of Medicine and Forensics, Xi'an Jiaotong University Xi'an 710061, China. Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Peter Houde
- Department of Biology, New Mexico State University, Las Cruces, NM 88003, USA
| | - Cai Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China. Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Simon Y W Ho
- School of Biological Sciences, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Brant C Faircloth
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA. Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Benoit Nabholz
- CNRS UMR 5554, Institut des Sciences de l'Evolution de Montpellier, Université Montpellier II Montpellier, France
| | - Jason T Howard
- Department of Neurobiology, Howard Hughes Medical Institute (HHMI), and Duke University Medical Center, Durham, NC 27710, USA
| | - Alexander Suh
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala Sweden
| | - Claudia C Weber
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala Sweden
| | - Rute R da Fonseca
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Jianwen Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Fang Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Hui Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Long Zhou
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Nitish Narula
- Department of Biology, New Mexico State University, Las Cruces, NM 88003, USA. Biodiversity and Biocomplexity Unit, Okinawa Institute of Science and Technology Onna-son, Okinawa 904-0495, Japan
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Ganesh Ganapathy
- Department of Neurobiology, Howard Hughes Medical Institute (HHMI), and Duke University Medical Center, Durham, NC 27710, USA
| | - Bastien Boussau
- Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Université de Lyon, F-69622 Villeurbanne, France
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA
| | - Volodymyr Zavidovych
- Department of Neurobiology, Howard Hughes Medical Institute (HHMI), and Duke University Medical Center, Durham, NC 27710, USA
| | - Sankar Subramanian
- Environmental Futures Research Institute, Griffith University, Nathan, Queensland 4111, Australia
| | - Toni Gabaldón
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation, Dr. Aiguader 88, 08003 Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain. Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain
| | - Salvador Capella-Gutiérrez
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation, Dr. Aiguader 88, 08003 Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain
| | - Jaime Huerta-Cepas
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation, Dr. Aiguader 88, 08003 Barcelona, Spain. Universitat Pompeu Fabra, Barcelona, Spain
| | - Bhanu Rekepalli
- Joint Institute for Computational Sciences, The University of Tennessee, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| | - Kasper Munch
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Mikkel Schierup
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Bent Lindow
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Wesley C Warren
- The Genome Institute, Washington University School of Medicine, St Louis, MI 63108, USA
| | - David Ray
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, MS 39762, USA. Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, MS 39762, USA. Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Richard E Green
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA
| | - Michael W Bruford
- Organisms and Environment Division, Cardiff School of Biosciences, Cardiff University Cardiff CF10 3AX, Wales, UK
| | - Xiangjiang Zhan
- Organisms and Environment Division, Cardiff School of Biosciences, Cardiff University Cardiff CF10 3AX, Wales, UK. Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Andrew Dixon
- International Wildlife Consultants, Carmarthen SA33 5YL, Wales, UK
| | - Shengbin Li
- College of Medicine and Forensics, Xi'an Jiaotong University Xi'an, 710061, China
| | - Ning Li
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing 100094, China
| | - Yinhua Huang
- State Key Laboratory for Agrobiotechnology, China Agricultural University, Beijing 100094, China
| | - Elizabeth P Derryberry
- Department of Ecology and Evolutionary Biology, Tulane University, New Orleans, LA 70118, USA. Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Mads Frost Bertelsen
- Center for Zoo and Wild Animal Health, Copenhagen Zoo Roskildevej 38, DK-2000 Frederiksberg, Denmark
| | - Frederick H Sheldon
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Robb T Brumfield
- Museum of Natural Science and Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR 97239, USA. Brazilian Avian Genome Consortium (CNPq/FAPESPA-SISBIO Aves), Federal University of Para, Belem, Para, Brazil
| | - Peter V Lovell
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR 97239, USA
| | - Morgan Wirthlin
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR 97239, USA
| | - Maria Paula Cruz Schneider
- Brazilian Avian Genome Consortium (CNPq/FAPESPA-SISBIO Aves), Federal University of Para, Belem, Para, Brazil. Institute of Biological Sciences, Federal University of Para, Belem, Para, Brazil
| | - Francisco Prosdocimi
- Brazilian Avian Genome Consortium (CNPq/FAPESPA-SISBIO Aves), Federal University of Para, Belem, Para, Brazil. Institute of Medical Biochemistry Leopoldo de Meis, Federal University of Rio de Janeiro, Rio de Janeiro RJ 21941-902, Brazil
| | - José Alfredo Samaniego
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Amhed Missael Vargas Velazquez
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Alonzo Alfaro-Núñez
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Paula F Campos
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Bent Petersen
- Centre for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark Kemitorvet 208, 2800 Kgs Lyngby, Denmark
| | - Thomas Sicheritz-Ponten
- Centre for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark Kemitorvet 208, 2800 Kgs Lyngby, Denmark
| | - An Pas
- Breeding Centre for Endangered Arabian Wildlife, Sharjah, United Arab Emirates
| | - Tom Bailey
- Dubai Falcon Hospital, Dubai, United Arab Emirates
| | - Paul Scofield
- Canterbury Museum Rolleston Avenue, Christchurch 8050, New Zealand
| | - Michael Bunce
- Trace and Environmental DNA Laboratory Department of Environment and Agriculture, Curtin University, Perth, Western Australia 6102, Australia
| | - David M Lambert
- Environmental Futures Research Institute, Griffith University, Nathan, Queensland 4111, Australia
| | - Qi Zhou
- Department of Integrative Biology, University of California, Berkeley, CA 94720, USA
| | - Polina Perelman
- Laboratory of Genomic Diversity, National Cancer Institute Frederick, MD 21702, USA. Institute of Molecular and Cellular Biology, SB RAS and Novosibirsk State University, Novosibirsk, Russia
| | - Amy C Driskell
- Smithsonian Institution National Museum of Natural History, Washington, DC 20013, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA
| | - Zijun Xiong
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Yongli Zeng
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Shiping Liu
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Zhenyu Li
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Binghang Liu
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Kui Wu
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Jin Xiao
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Xiong Yinqi
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Qiuemei Zheng
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Yong Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen 518083, China
| | - Linnea Smeds
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala Sweden
| | - Frank E Rheindt
- Department of Biological Sciences, National University of Singapore, Republic of Singapore
| | - Michael Braun
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Suitland, MD 20746, USA
| | - Jon Fjeldsa
- Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark
| | - Ludovic Orlando
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - F Keith Barker
- Bell Museum of Natural History, University of Minnesota, Saint Paul, MN 55108, USA
| | - Knud Andreas Jønsson
- Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark. Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD, UK. Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot SL5 7PY, UK
| | - Warren Johnson
- Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, VA 22630, USA
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, National Zoological Park, Washington, DC 20008, USA
| | - Stephen O'Brien
- Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia 199004. Oceanographic Center, Nova Southeastern University, Ft Lauderdale, FL 33004, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, UCSC, Santa Cruz, CA 95064, USA
| | - Oliver A Ryder
- San Diego Zoo Institute for Conservation Research, Escondido, CA 92027, USA
| | - Carsten Rahbek
- Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark. Department of Life Sciences, Imperial College London, Silwood Park Campus, Ascot SL5 7PY, UK
| | - Eske Willerslev
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark
| | - Gary R Graves
- Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Universitetsparken 15, DK-2100 Copenhagen Ø, Denmark. Department of Vertebrate Zoology, MRC-116, National Museum of Natural History, Smithsonian Institution, Washington, DC 20013, USA
| | - Travis C Glenn
- Department of Environmental Health Science, University of Georgia, Athens, GA 30602, USA
| | - John McCormack
- Moore Laboratory of Zoology and Department of Biology, Occidental College, Los Angeles, CA 90041, USA
| | - Dave Burt
- Department of Genomics and Genetics, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian EH25 9RG, UK
| | - Hans Ellegren
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, SE-752 36 Uppsala Sweden
| | - Per Alström
- Swedish Species Information Centre, Swedish University of Agricultural Sciences Box 7007, SE-750 07 Uppsala, Sweden. Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Alexandros Stamatakis
- Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany. Institute of Theoretical Informatics, Department of Informatics, Karlsruhe Institute of Technology, D- 76131 Karlsruhe, Germany
| | - David P Mindell
- Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94158, USA
| | - Joel Cracraft
- Department of Ornithology, American Museum of Natural History, New York, NY 10024, USA
| | - Edward L Braun
- Department of Biology and Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - Tandy Warnow
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA. Departments of Bioengineering and Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Wang Jun
- BGI-Shenzhen, Shenzhen 518083, China. Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark. Princess Al Jawhara Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah 21589, Saudi Arabia. Macau University of Science and Technology, Avenida Wai long, Taipa, Macau 999078, China. Department of Medicine, University of Hong Kong, Hong Kong.
| | - M Thomas P Gilbert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade 5-7, 1350 Copenhagen, Denmark. Trace and Environmental DNA Laboratory Department of Environment and Agriculture, Curtin University, Perth, Western Australia 6102, Australia.
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China. Centre for Social Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, DK-2100 Copenhagen, Denmark.
| |
Collapse
|
12
|
Suh A, Weber CC, Kehlmaier C, Braun EL, Green RE, Fritz U, Ray DA, Ellegren H. Early mesozoic coexistence of amniotes and hepadnaviridae. PLoS Genet 2014; 10:e1004559. [PMID: 25501991 PMCID: PMC4263362 DOI: 10.1371/journal.pgen.1004559] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Accepted: 06/24/2014] [Indexed: 12/16/2022] Open
Abstract
Hepadnaviridae are double-stranded DNA viruses that infect some species of birds and mammals. This includes humans, where hepatitis B viruses (HBVs) are prevalent pathogens in considerable parts of the global population. Recently, endogenized sequences of HBVs (eHBVs) have been discovered in bird genomes where they constitute direct evidence for the coexistence of these viruses and their hosts from the late Mesozoic until present. Nevertheless, virtually nothing is known about the ancient host range of this virus family in other animals. Here we report the first eHBVs from crocodilian, snake, and turtle genomes, including a turtle eHBV that endogenized >207 million years ago. This genomic “fossil” is >125 million years older than the oldest avian eHBV and provides the first direct evidence that Hepadnaviridae already existed during the Early Mesozoic. This implies that the Mesozoic fossil record of HBV infection spans three of the five major groups of land vertebrates, namely birds, crocodilians, and turtles. We show that the deep phylogenetic relationships of HBVs are largely congruent with the deep phylogeny of their amniote hosts, which suggests an ancient amniote–HBV coexistence and codivergence, at least since the Early Mesozoic. Notably, the organization of overlapping genes as well as the structure of elements involved in viral replication has remained highly conserved among HBVs along that time span, except for the presence of the X gene. We provide multiple lines of evidence that the tumor-promoting X protein of mammalian HBVs lacks a homolog in all other hepadnaviruses and propose a novel scenario for the emergence of X via segmental duplication and overprinting of pre-existing reading frames in the ancestor of mammalian HBVs. Our study reveals an unforeseen host range of prehistoric HBVs and provides novel insights into the genome evolution of hepadnaviruses throughout their long-lasting association with amniote hosts. Viruses are not known to leave physical fossil traces, which makes our understanding of their evolutionary prehistory crucially dependent on the detection of endogenous viruses. Ancient endogenous viruses, also known as paleoviruses, are relics of viral genomes or fragments thereof that once infiltrated their host's germline and then remained as molecular “fossils” within the host genome. The massive genome sequencing of recent years has unearthed vast numbers of paleoviruses from various animal genomes, including the first endogenous hepatitis B viruses (eHBVs) in bird genomes. We screened genomes of land vertebrates (amniotes) for the presence of paleoviruses and identified ancient eHBVs in the recently sequenced genomes of crocodilians, snakes, and turtles. We report an eHBV that is >207 million years old, making it the oldest endogenous virus currently known. Furthermore, our results provide direct evidence that the Hepadnaviridae virus family infected birds, crocodilians and turtles during the Mesozoic Era, and suggest a long-lasting coexistence of these viruses and their amniote hosts at least since the Early Mesozoic. We challenge previous views on the origin of the oncogenic X gene and provide an evolutionary explanation as to why only mammalian hepatitis B infection leads to hepatocellular carcinoma.
Collapse
Affiliation(s)
- Alexander Suh
- Department of Evolutionary Biology (EBC), Uppsala University, Uppsala, Sweden
- * E-mail:
| | - Claudia C. Weber
- Department of Evolutionary Biology (EBC), Uppsala University, Uppsala, Sweden
| | - Christian Kehlmaier
- Museum of Zoology, Senckenberg Research Institute and Natural History Museum, Dresden, Germany
| | - Edward L. Braun
- Department of Biology and Genetics Institute, University of Florida, Gainesville, Florida, United States of America
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, United States of America
| | - Uwe Fritz
- Museum of Zoology, Senckenberg Research Institute and Natural History Museum, Dresden, Germany
| | - David A. Ray
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Mississippi State, Mississippi, United States of America
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, Mississippi, United States of America
| | - Hans Ellegren
- Department of Evolutionary Biology (EBC), Uppsala University, Uppsala, Sweden
| |
Collapse
|
13
|
Weber CC, Boussau B, Romiguier J, Jarvis ED, Ellegren H. Evidence for GC-biased gene conversion as a driver of between-lineage differences in avian base composition. Genome Biol 2014; 15:549. [PMID: 25496599 PMCID: PMC4290106 DOI: 10.1186/s13059-014-0549-1] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Accepted: 11/19/2014] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND While effective population size (Ne) and life history traits such as generation time are known to impact substitution rates, their potential effects on base composition evolution are less well understood. GC content increases with decreasing body mass in mammals, consistent with recombination-associated GC biased gene conversion (gBGC) more strongly impacting these lineages. However, shifts in chromosomal architecture and recombination landscapes between species may complicate the interpretation of these results. In birds, interchromosomal rearrangements are rare and the recombination landscape is conserved, suggesting that this group is well suited to assess the impact of life history on base composition. RESULTS Employing data from 45 newly and 3 previously sequenced avian genomes covering a broad range of taxa, we found that lineages with large populations and short generations exhibit higher GC content. The effect extends to both coding and non-coding sites, indicating that it is not due to selection on codon usage. Consistent with recombination driving base composition, GC content and heterogeneity were positively correlated with the rate of recombination. Moreover, we observed ongoing increases in GC in the majority of lineages. CONCLUSIONS Our results provide evidence that gBGC may drive patterns of nucleotide composition in avian genomes and are consistent with more effective gBGC in large populations and a greater number of meioses per unit time; that is, a shorter generation time. Thus, in accord with theoretical predictions, base composition evolution is substantially modulated by species life history.
Collapse
Affiliation(s)
- Claudia C Weber
- />Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden
| | - Bastien Boussau
- />Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558 Villeurbanne, France
| | | | - Erich D Jarvis
- />Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC USA
| | - Hans Ellegren
- />Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden
| |
Collapse
|
14
|
Abstract
Several reports from mammals indicate that an increase in the mutation rate in late-replicating regions may, in part, be responsible for the observed genomic heterogeneity in neutral substitution rates and levels of diversity, although the mechanisms for this remain poorly understood. Recent evidence also suggests that late replication is associated with high mutability in yeast. This then raises the question as to whether a similar effect is operating across all eukaryotes. Limited evidence from one chromosome arm in Drosophila melanogaster suggests the opposite pattern, with regions overlapping early-firing origins showing increased levels of diversity and divergence. Given the availability of genome-wide replication timing profiles for D. melanogaster, we now return to this issue. Consistent with what is seen in other taxa, we find that divergence at synonymous sites in exon cores, as well as divergence at putatively unconstrained intronic sites, is elevated in late-replicating regions. Analysis of genes with low codon usage bias suggests a ∼30% difference in mutation rate between the earliest and the latest replicating sequence. Intronic sequence suggests a more modest difference. We additionally show that an increase in diversity in late-replicating sequences is not owing to replication timing covarying with the local recombination rate. If anything, the effects of recombination mask the impact of replication timing. We conclude that, contrary to prior reports and consistent with what is seen in mammals and yeast, there is indeed a relationship between rates of nucleotide divergence and diversity and replication timing that is consistent with an increase in the mutation rate during late S-phase in D. melanogaster. It is therefore plausible that such an effect might be common among eukaryotes. The result may have implications for the inference of positive selection.
Collapse
Affiliation(s)
- Claudia C Weber
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | | | | |
Collapse
|
15
|
Weber CC, Hurst LD. Support for multiple classes of local expression clusters in Drosophila melanogaster, but no evidence for gene order conservation. Genome Biol 2011; 12:R23. [PMID: 21414197 PMCID: PMC3129673 DOI: 10.1186/gb-2011-12-3-r23] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2011] [Revised: 03/04/2011] [Accepted: 03/17/2011] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Gene order in eukaryotic genomes is not random, with genes with similar expression profiles tending to cluster. In yeasts, the model taxon for gene order analysis, such syntenic clusters of non-homologous genes tend to be conserved over evolutionary time. Whether similar clusters show gene order conservation in other lineages is, however, undecided. Here, we examine this issue in Drosophila melanogaster using high-resolution chromosome rearrangement data. RESULTS We show that D. melanogaster has at least three classes of expression clusters: first, as observed in mammals, large clusters of functionally unrelated housekeeping genes; second, small clusters of functionally related highly co-expressed genes; and finally, as previously defined by Spellman and Rubin, larger domains of co-expressed but functionally unrelated genes. The latter are, however, not independent of the small co-expression clusters and likely reflect a methodological artifact. While the small co-expression and housekeeping/essential gene clusters resemble those observed in yeast, in contrast to yeast, we see no evidence that any of the three cluster types are preserved as synteny blocks. If anything, adjacent co-expressed genes are more likely to become rearranged than expected. Again in contrast to yeast, in D. melanogaster, gene pairs with short intergene distance or in divergent orientations tend to have higher rearrangement rates. These findings are consistent with co-expression being partly due to shared chromatin environment. CONCLUSIONS We conclude that, while similar in terms of cluster types, gene order evolution has strikingly different patterns in yeasts and in D. melanogaster, although recombination is associated with gene order rearrangement in both.
Collapse
Affiliation(s)
- Claudia C Weber
- Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | | |
Collapse
|
16
|
Weber CC, Hurst LD. Intronic AT skew is a defendable proxy for germline transcription but does not predict crossing-over or protein evolution rates in Drosophila melanogaster. J Mol Evol 2010; 71:415-26. [PMID: 20938653 DOI: 10.1007/s00239-010-9395-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Accepted: 09/17/2010] [Indexed: 01/28/2023]
Abstract
Recent evidence suggests that germline transcription may affect both protein evolutionary rates, possibly mediated by repair processes, and recombination rates, possibly mediated by chromatin and epigenetic modification. Here, we test these propositions in Drosophila melanogaster. The challenge for such analyses is to provide defendable measures of germline gene expression. Intronic AT skew is a good candidate measure as it is thought to be a consequence, at least in part, of transcription-coupled repair. Prior evidence suggests that intronic AT skew in D. melanogaster is not affected by proximity to intron extremities and differs between transcribed DNA and flanking sequence. We now also establish that intronic AT skew is a defendable proxy for germline expression as (a) it is more similar than expected by chance between introns of the same gene (which is not accounted for by physical proximity), (b) is correlated with male germline expression, and (c) is more pronounced in broadly expressed genes. Furthermore, (d) a trend for intronic skew to differ between 3' and 5' ends of genes is particular to broadly expressed genes. Finally, (e) controlling for physical distance, introns of proximate genes are most different in skew if they have different tissue specificity. We find that intronic AT skew, employed as a proxy for germline transcription, correlates neither with recombination rates nor with the rate of protein evolution. We conclude that there is no prima facie evidence that germline expression modulates recombination rates or monotonically affects protein evolution rates in D. melanogaster.
Collapse
Affiliation(s)
- Claudia C Weber
- Department of Biology and Biochemistry, University of Bath, Bath, UK
| | | |
Collapse
|
17
|
Weber CC, Hurst LD. Protein rates of evolution are predicted by double-strand break events, independent of crossing-over rates. Genome Biol Evol 2009; 1:340-9. [PMID: 20333203 PMCID: PMC2817428 DOI: 10.1093/gbe/evp033] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/28/2009] [Indexed: 12/22/2022] Open
Abstract
Theory predicts that, owing to reduced Hill–Robertson interference, genomic regions with high crossing-over rates should experience more efficient selection. In Saccharomyces cerevisiae a negative correlation between the local recombination rate, assayed as meiotic double-strand breaks (DSBs), and the local rate of protein evolution has been considered consistent with such a model. Although DSBs are a prerequisite for crossing-over, they need not result in crossing-over. With recent high-resolution crossover data, we now return to this issue comparing two species of yeast. Strikingly, even allowing for crossover rates, both the rate of premeiotic DSBs and of noncrossover recombination events predict a gene's rate of evolution. This both questions the validity of prior analyses and strongly suggests that any correlation between crossover rates and rates of protein evolution could be owing to slow-evolving genes being prone to DSBs or a direct effect of DSBs on sequence evolution. To ask if classical theory of recombination has any relevance, we determine whether crossover rates predict rates of protein evolution, controlling for noncrossover DSB events, gene ontology (GO) class, gene expression, protein abundance, nucleotide content, and dispensability. We find that genes with high crossing-over rates have low rates of protein evolution after such control, although any correlation is weaker than that previously reported considering meiotic DSBs as a proxy. The data are consistent both with recombination enhancing the efficiency of purifying selection and, independently, with DSBs being associated with low rates of evolution.
Collapse
Affiliation(s)
- Claudia C Weber
- Department of Biology and Biochemistry, University of Bath, Bath, Somerset, UK
| | | |
Collapse
|
18
|
Schuessel K, Frey C, Jourdan C, Keil U, Weber CC, Müller-Spahn F, Müller WE, Eckert A. Aging sensitizes toward ROS formation and lipid peroxidation in PS1M146L transgenic mice. Free Radic Biol Med 2006; 40:850-62. [PMID: 16520237 DOI: 10.1016/j.freeradbiomed.2005.10.041] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2005] [Revised: 09/13/2005] [Accepted: 10/10/2005] [Indexed: 02/07/2023]
Abstract
Mutations in the presenilins (PS) account for the majority of familial Alzheimer disease (FAD) cases. To test the hypothesis that oxidative stress can underlie the deleterious effects of presenilin mutations, we analyzed lipid peroxidation products (4-hydroxynonenal (HNE) and malondialdehyde) and antioxidant defenses in brain tissue and levels of reactive oxygen species (ROS) in splenic lymphocytes from transgenic mice bearing human PS1 with the M146L mutation (PS1M146L) compared to those from mice transgenic for wild-type human PS1 (PS1wt) and nontransgenic littermate control mice. In brain tissue, HNE levels were increased only in aged (19-22 months) PS1M146L transgenic animals compared to PS1wt mice and not in young (3-4 months) or middle-aged mice (13-15 months). Similarly, in splenic lymphocytes expressing the transgenic PS1 proteins, mitochondrial and cytosolic ROS levels were elevated to 142.1 and 120.5% relative to controls only in cells from aged PS1M146L animals. Additionally, brain tissue HNE levels were positively correlated with mitochondrial ROS levels in splenic lymphocytes, indicating that oxidative stress can be detected in different tissues of PS1 transgenic mice. Antioxidant defenses (activities of antioxidant enzymes Cu/Zn-SOD, GPx, or GR) or susceptibility to in vitro oxidative stimulation was unaltered. In summary, these results demonstrate that the PS1M146L mutation increases mitochondrial ROS formation and oxidative damage in aged mice. Hence, oxidative stress caused by the combined effects of aging and PS1 mutations may be causative for triggering neurodegenerative events in FAD patients.
Collapse
Affiliation(s)
- Katrin Schuessel
- Department of Pharmacology, Biocentre, University of Frankfurt, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Weber CC, Kressmann S, Ott M, Fricker G, Müller WE. Inhibition of P-glycoprotein function by several antidepressants may not contribute to clinical efficacy. Pharmacopsychiatry 2006; 38:293-300. [PMID: 16342001 DOI: 10.1055/s-2005-916184] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
INTRODUCTION In many depressive patients the negative feedback mechanism of the HPA (hypothalamic-pituitary-adrenocortical) axis is impaired. It has been suggested that antidepressants inhibit membrane glucocorticoid transporters like P-Glycoprotein (Pgp) and hence enhance the intracellular glucocorticoid concentration, leading to an increased glucocorticoid-receptor mediated gene transcription and therefore to normalization of the function of the HPA axis. The aim of this study is to investigate inhibition of Pgp by several different antidepressants. METHODS We characterized the inhibitory potencies of the antidepressants in two in vitro assays by using calcein-AM as Pgp substrate. The two different cell-systems expressing Pgp were: 1. PBCEC (porcine brain capillary endothelial cells) as model for the blood-brain-barrier, and 2. A human lymphocytic leukaemia cell line CEM and the multi-drug-resistant (MDR) cell line VLB-100, expressing Pgp as model for the human protein. RESULTS All of the antidepressants tested inhibit the transport of calcein-AM by Pgp in the micromolecular range. DISCUSSION Because this inhibition is only seen at concentrations above therapeutically relevant plasma levels, their effect my not play a role for the mechanism of action of the antidepressants tested.
Collapse
Affiliation(s)
- C C Weber
- Department of Pharmacology, Biocenter, University of Frankfurt, Frankfurt, Germany
| | | | | | | | | |
Collapse
|
20
|
Schmitt-Schillig S, Schaffer S, Weber CC, Eckert GP, Müller WE. Flavonoids and the aging brain. J Physiol Pharmacol 2005; 56 Suppl 1:23-36. [PMID: 15800383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Received: 01/31/2005] [Accepted: 02/15/2005] [Indexed: 05/02/2023]
Abstract
Like in all other organs, the functional capacity of the human brain deteriorates over time. Pathological events such as oxidative stress, due to the elevated release of free radicals and reactive oxygen or nitrogen species, the subsequently enhanced oxidative modification of lipids, protein, and nucleic acids, and the modulation of apoptotic signaling pathways contribute to loss of brain function. The identification of neuroprotective food components is one strategy to facilitate healthy brain aging. Flavonoids were shown to activate key enzymes in mitochondrial respiration and to protect neuronal cells by acting as antioxidants, thus breaking the vicious cycle of oxidative stress and tissue damage. Furthermore, recent data indicate a favorable effect of flavonoids on neuro-inflammatory events. Whereas most of these effects have been shown in vitro, limited data in vivo are available, suggesting a rather low penetration of flavonoids into the brain. Nevertheless, several reports support the concept that flavonoid intake inhibits certain biochemical processes of brain aging, and might thus prevent to some extent the decline of cognitive functions with aging as well as the development or the course of neurodegenerative diseases. However, more data are needed to assess the true impact of flavonoids on brain aging.
Collapse
Affiliation(s)
- S Schmitt-Schillig
- Institute of Pharmacology (ZAFES), Biocenter Niederursel, University of Frankfurt, Frankfurt am Main, Germany
| | | | | | | | | |
Collapse
|
21
|
Weber CC, Kressmann S, Fricker G, Müller WE. Modulation of P-glycoprotein function by St John's wort extract and its major constituents. Pharmacopsychiatry 2005; 37:292-8. [PMID: 15551196 DOI: 10.1055/s-2004-832686] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
INTRODUCTION Recent data suggest some relevant drug interactions caused by St John's wort extract, which can be explained by interactions with the Cytochrome P450 system or P-Glycoprotein (Pgp). Interaction with Pgp, including activation, inhibition and induction, can lead to altered plasma or brain levels of Pgp substrates. The aim of the present study was to investigate the possible interactions of St John's wort extract and most relevant constituents with the transport activity of Pgp. METHODS We characterized the modulatory potencies in two in vitro assays using calcein-AM, first in VLB cells (a human lymphocytic leukemia cell line expressing Pgp) and second in PBCEC cells (porcine brain capillary endothelial cells). RESULTS The extract, as well as some of the tested constituents modulate the transport by Pgp in the micromolecular range. Quercetin and hyperforin seem to be most potent. CONCLUSIONS These findings suggest the possibility of drug interactions at the level of the gastro-intestinal absorption of drugs. Plasma levels of the constituents of St John's wort are very likely too low to interfere with Pgp at the blood-brain-barrier with the possible exception of quercetin.
Collapse
Affiliation(s)
- C C Weber
- Department of Pharmacology, Biocenter, University of Frankfurt, Marie-Curie-Strasse 9, 60439 Frankfurt, Germany
| | | | | | | |
Collapse
|
22
|
Eckert GP, Keller JH, Weber CC, Franke C, Peters I, Karas M, Schubert-Zsilavecz M, Müller WE. P4-365 Brain availability and effects on lipid homeostasis of statins. Neurobiol Aging 2004. [DOI: 10.1016/s0197-4580(04)81923-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|