51
|
Canale AS, Venev SV, Whitfield TW, Caffrey DR, Marasco WA, Schiffer CA, Kowalik TF, Jensen JD, Finberg RW, Zeldovich KB, Wang JP, Bolon DNA. Synonymous Mutations at the Beginning of the Influenza A Virus Hemagglutinin Gene Impact Experimental Fitness. J Mol Biol 2018; 430:1098-1115. [PMID: 29466705 DOI: 10.1016/j.jmb.2018.02.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 01/19/2018] [Accepted: 02/05/2018] [Indexed: 01/15/2023]
Abstract
The fitness effects of synonymous mutations can provide insights into biological and evolutionary mechanisms. We analyzed the experimental fitness effects of all single-nucleotide mutations, including synonymous substitutions, at the beginning of the influenza A virus hemagglutinin (HA) gene. Many synonymous substitutions were deleterious both in bulk competition and for individually isolated clones. Investigating protein and RNA levels of a subset of individually expressed HA variants revealed that multiple biochemical properties contribute to the observed experimental fitness effects. Our results indicate that a structural element in the HA segment viral RNA may influence fitness. Examination of naturally evolved sequences in human hosts indicates a preference for the unfolded state of this structural element compared to that found in swine hosts. Our overall results reveal that synonymous mutations may have greater fitness consequences than indicated by simple models of sequence conservation, and we discuss the implications of this finding for commonly used evolutionary tests and analyses.
Collapse
Affiliation(s)
- Aneth S Canale
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Sergey V Venev
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Troy W Whitfield
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01655, USA; Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Daniel R Caffrey
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Wayne A Marasco
- Department of Cancer Immunology & Virology, Dana-Farber Cancer Institute, Harvard Medical School, 450 Brookline Avenue, Boston, MA 02215, USA
| | - Celia A Schiffer
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Timothy F Kowalik
- Department of Microbiology and Physiological Systems, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Center for Evolution & Medicine, Arizona State University, Tempe, AZ. 85281, USA
| | - Robert W Finberg
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Konstantin B Zeldovich
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Jennifer P Wang
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, USA.
| | - Daniel N A Bolon
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA.
| |
Collapse
|
52
|
Parto S, Lartillot N. Molecular adaptation in Rubisco: Discriminating between convergent evolution and positive selection using mechanistic and classical codon models. PLoS One 2018; 13:e0192697. [PMID: 29432438 PMCID: PMC5809049 DOI: 10.1371/journal.pone.0192697] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 01/29/2018] [Indexed: 11/19/2022] Open
Abstract
Rubisco (Ribulose-1, 5-biphosphate carboxylase/oxygenase) is the most important enzyme on earth, catalyzing the first step of photosynthetic CO2 fixation. So, without it, there would be no storing of the sun's energy in plants. Molecular adaptation of Rubisco to C4 photosynthetic pathway has attracted a lot of attention. C4 plants, which comprise less than 5% of land plants, have evolved more efficient photosynthesis compared to C3 plants. Interestingly, a large number of independent transitions from C3 to C4 phenotype have occurred. Each time, the Rubisco enzyme has been subject to similar changes in selective pressure, thus providing an excellent model for convergent evolution at the molecular level. Molecular adaptation is often identified with positive selection and is typically characterized by an elevated ratio of non-synonymous to synonymous substitution rate (dN/dS). However, convergent adaptation is expected to leave a different molecular signature, taking the form of repeated transitions toward identical or similar amino acids. Here, we used a previously introduced codon-based differential-selection model to detect and quantify consistent patterns of convergent adaptation in Rubisco in eudicots. We further contrasted our results with those obtained by classical codon models based on the estimation of dN/dS. We found that the two classes of models tend to select distinct, although overlapping, sets of positions. This discrepancy in the results illustrates the conceptual difference between these models while emphasizing the need to better discriminate between qualitatively different selective regimes, by using a broader class of codon models than those currently considered in molecular evolutionary studies.
Collapse
Affiliation(s)
- Sahar Parto
- Department of Biochemistry and Molecular Medicine, Université de Montreal, Montreal, Quebec, Canada
- * E-mail:
| | - Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, CNRS, UMR, Lyon, France
| |
Collapse
|
53
|
Platt A, Weber CC, Liberles DA. Protein evolution depends on multiple distinct population size parameters. BMC Evol Biol 2018; 18:17. [PMID: 29422024 PMCID: PMC5806465 DOI: 10.1186/s12862-017-1085-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 11/20/2017] [Indexed: 01/08/2023] Open
Abstract
That population size affects the fate of new mutations arising in genomes, modulating both how frequently they arise and how efficiently natural selection is able to filter them, is well established. It is therefore clear that these distinct roles for population size that characterize different processes should affect the evolution of proteins and need to be carefully defined. Empirical evidence is consistent with a role for demography in influencing protein evolution, supporting the idea that functional constraints alone do not determine the composition of coding sequences. Given that the relationship between population size, mutant fitness and fixation probability has been well characterized, estimating fitness from observed substitutions is well within reach with well-formulated models. Molecular evolution research has, therefore, increasingly begun to leverage concepts from population genetics to quantify the selective effects associated with different classes of mutation. However, in order for this type of analysis to provide meaningful information about the intra- and inter-specific evolution of coding sequences, a clear definition of concepts of population size, what they influence, and how they are best parameterized is essential. Here, we present an overview of the many distinct concepts that “population size” and “effective population size” may refer to, what they represent for studying proteins, and how this knowledge can be harnessed to produce better specified models of protein evolution.
Collapse
Affiliation(s)
- Alexander Platt
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA
| | - Claudia C Weber
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA.
| |
Collapse
|
54
|
Bai J, Xu S, Nie Z, Wang Y, Zhu C, Wang Y, Min W, Cai Y, Zou J, Zhou X. The complete mitochondrial genome of Huananpotamon lichuanense (Decapoda: Brachyura) with phylogenetic implications for freshwater crabs. Gene 2018; 646:217-226. [PMID: 29307851 DOI: 10.1016/j.gene.2018.01.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Revised: 11/22/2017] [Accepted: 01/03/2018] [Indexed: 10/18/2022]
Abstract
In the present study, we determined the complete mitochondrial genome of Huananpotamon lichuanense (Decapoda: Brachyura) for the first time. The genome is 15,380bp in length and typically consists of 37 genes. When the gene order was compared to the ancestral crustacean type, two tRNA genes (tRNAHis and tRNAGln) were rearranged in H. lichuanense, and the translocation of tRNAGln appeared only in Potamoidea crabs, such as Geothelphusa dehaani and Sinopotamon xiushuiense, supporting the monophyly of the Potamoidea superfamily. Thirteen protein-coding genes and 2 rRNA genes were divided into five complexes to perform the phylogenetic analysis, and the results showed that the trees constructed by complex I (ND1-ND6 and ND4L), complex IV (COX1-COX3) and rRNA genes better accord with the morphological classification system, suggesting that molecular markers of higher-level phylogeny can be developed in these three complexes in the future. The estimated divergence time for freshwater crabs is approximately 133.58Ma, and G. dehaani from Japan diverged from the freshwater crabs of mainland China approximately 60.66Ma. A selective pressure analysis based on current data revealed obviously increasing dN/dS ratios (except for ATP6 and ND4L) of freshwater crabs, and the accumulation of nonsynonymous mutations suggests that terrestrial habitats provide a relatively relaxed selective pressure environment for this group.
Collapse
Affiliation(s)
- Jun Bai
- Research lab of Freshwater Crustacean Decapoda &Paragonimus, School of Basic Medical Sciences, Nanchang University, 461 Bayi Avenue, Nanchang City, Jiangxi Province 330006, People's Republic of China
| | - Shuxin Xu
- Research lab of Freshwater Crustacean Decapoda &Paragonimus, School of Basic Medical Sciences, Nanchang University, 461 Bayi Avenue, Nanchang City, Jiangxi Province 330006, People's Republic of China
| | - Zongheng Nie
- Research lab of Freshwater Crustacean Decapoda &Paragonimus, School of Basic Medical Sciences, Nanchang University, 461 Bayi Avenue, Nanchang City, Jiangxi Province 330006, People's Republic of China
| | - Yifan Wang
- Institute of Pathogen Biology, Jiangxi Academy of Medical Sciences, 461 Bayi Avenue, Nanchang City, Jiangxi Province 330006, People's Republic of China
| | - Chunchao Zhu
- Research lab of Freshwater Crustacean Decapoda &Paragonimus, School of Basic Medical Sciences, Nanchang University, 461 Bayi Avenue, Nanchang City, Jiangxi Province 330006, People's Republic of China
| | - Yan Wang
- Research lab of Freshwater Crustacean Decapoda &Paragonimus, School of Basic Medical Sciences, Nanchang University, 461 Bayi Avenue, Nanchang City, Jiangxi Province 330006, People's Republic of China
| | - Weiping Min
- Institute of Pathogen Biology, Jiangxi Academy of Medical Sciences, 461 Bayi Avenue, Nanchang City, Jiangxi Province 330006, People's Republic of China
| | - Yixiong Cai
- National Biodiversity Centre, National Parks Board, 1 Cluny Road, Singapore 259569, Republic of Singapore
| | - Jiexin Zou
- Research lab of Freshwater Crustacean Decapoda &Paragonimus, School of Basic Medical Sciences, Nanchang University, 461 Bayi Avenue, Nanchang City, Jiangxi Province 330006, People's Republic of China.
| | - Xianmin Zhou
- Key Laboratory of Poyang Lake Environment and Resource Utilization, Ministry of Education, Nanchang University, 1299 Xuefu Avenue, Nanchang City, Jiangxi Province 330031, People's Republic of China.
| |
Collapse
|
55
|
Hilton SK, Doud MB, Bloom JD. phydms: software for phylogenetic analyses informed by deep mutational scanning. PeerJ 2017; 5:e3657. [PMID: 28785526 PMCID: PMC5541924 DOI: 10.7717/peerj.3657] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 07/15/2017] [Indexed: 11/30/2022] Open
Abstract
It has recently become possible to experimentally measure the effects of all amino-acid point mutations to proteins using deep mutational scanning. These experimental measurements can inform site-specific phylogenetic substitution models of gene evolution in nature. Here we describe software that efficiently performs analyses with such substitution models. This software, phydms, can be used to compare the results of deep mutational scanning experiments to the selection on genes in nature. Given a phylogenetic tree topology inferred with another program, phydms enables rigorous comparison of how well different experiments on the same gene capture actual natural selection. It also enables re-scaling of deep mutational scanning data to account for differences in the stringency of selection in the lab and nature. Finally, phydms can identify sites that are evolving differently in nature than expected from experiments in the lab. As data from deep mutational scanning experiments become increasingly widespread, phydms will facilitate quantitative comparison of the experimental results to the actual selection pressures shaping evolution in nature.
Collapse
Affiliation(s)
- Sarah K Hilton
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, United States of America
| | - Michael B Doud
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, United States of America.,Medical Scientist Training Program, University of Washington, Seattle, WA, United States of America
| | - Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, United States of America
| |
Collapse
|
56
|
Jones CT, Youssef N, Susko E, Bielawski JP. Shifting Balance on a Static Mutation-Selection Landscape: A Novel Scenario of Positive Selection. Mol Biol Evol 2017; 34:391-407. [PMID: 28110273 DOI: 10.1093/molbev/msw237] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
A version of the mechanistic mutation-selection (MutSel) model that accounts for temporal dynamics at a site is presented. This is used to show that the rate ratio dN/dS at a site can be transiently >1 even when fitness coefficients are fixed or the fitness landscape is static. This occurs whenever a site drifts away from its fitness peak and is then forced back by selection, a process reminiscent of shifting balance. Shifting balance is strongest when the substitution process is not dominated by selection or drift, but admits interplay between the two. Under this condition, site-specific changes in dN/dS were inferred in 78-100% of trials, and positive selection (i.e., dN/dS>1) in 10-40% of trials, when sequence alignments generated under MutSel were fitted to two popular phenomenological branch-site models. These results demonstrate that positive selection can occur without a change in fitness regime, and that this is detectable by branch-site models. In addition, MutSel is used to show that a site can be occupied by a sub-optimal amino acid for long periods on a fixed landscape when selection is stringent. This has implications for the interpretation of constant-but-different site patterns typically attributed to changes in fitness. Furthermore, a version of MutSel with episodic changes in fitness coefficients is used to illustrate systematic differences between parameters used to generate data under MutSel and their counterparts estimated by a simple codon model. Motivated by a discrepancy in the literature, interpretation of dN/dS in the context of MutSel is also discussed.
Collapse
Affiliation(s)
- Christopher T Jones
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia
| | - Noor Youssef
- Department of Biology, Dalhousie University, Halifax, Nova Scotia
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia.,Center for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia
| | - Joseph P Bielawski
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia.,Department of Biology, Dalhousie University, Halifax, Nova Scotia.,Center for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia
| |
Collapse
|
57
|
Sydykova DK, Wilke CO. Calculating site-specific evolutionary rates at the amino-acid or codon level yields similar rate estimates. PeerJ 2017; 5:e3391. [PMID: 28584717 PMCID: PMC5452972 DOI: 10.7717/peerj.3391] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 05/08/2017] [Indexed: 11/20/2022] Open
Abstract
Site-specific evolutionary rates can be estimated from codon sequences or from amino-acid sequences. For codon sequences, the most popular methods use some variation of the dN∕dS ratio. For amino-acid sequences, one widely-used method is called Rate4Site, and it assigns a relative conservation score to each site in an alignment. How site-wise dN∕dS values relate to Rate4Site scores is not known. Here we elucidate the relationship between these two rate measurements. We simulate sequences with known dN∕dS, using either dN∕dS models or mutation–selection models for simulation. We then infer Rate4Site scores on the simulated alignments, and we compare those scores to either true or inferred dN∕dS values on the same alignments. We find that Rate4Site scores generally correlate well with true dN∕dS, and the correlation strengths increase in alignments with greater sequence divergence and more taxa. Moreover, Rate4Site scores correlate very well with inferred (as opposed to true) dN∕dS values, even for small alignments with little divergence. Finally, we verify this relationship between Rate4Site and dN∕dS in a variety of empirical datasets. We conclude that codon-level and amino-acid-level analysis frameworks are directly comparable and yield very similar inferences.
Collapse
Affiliation(s)
- Dariya K Sydykova
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
58
|
Bloom JD. Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models. Biol Direct 2017; 12:1. [PMID: 28095902 PMCID: PMC5240389 DOI: 10.1186/s13062-016-0172-z] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 12/14/2016] [Indexed: 12/23/2022] Open
Abstract
Background Sites of positive selection are identified by comparing observed evolutionary patterns to those expected under a null model for evolution in the absence of such selection. For protein-coding genes, the most common null model is that nonsynonymous and synonymous mutations fix at equal rates; this unrealistic model has limited power to detect many interesting forms of selection. Results I describe a new approach that uses a null model based on experimental measurements of a gene’s site-specific amino-acid preferences generated by deep mutational scanning in the lab. This null model makes it possible to identify both diversifying selection for repeated amino-acid change and differential selection for mutations to amino acids that are unexpected given the measurements made in the lab. I show that this approach identifies sites of adaptive substitutions in four genes (lactamase, Gal4, influenza nucleoprotein, and influenza hemagglutinin) far better than a comparable method that simply compares the rates of nonsynonymous and synonymous substitutions. Conclusions As rapid increases in biological data enable increasingly nuanced descriptions of the constraints on individual protein sites, approaches like the one here can improve our ability to identify many interesting forms of selection in natural sequences. Reviewers This article was reviewed by Sebastian Maurer-Stroh, Olivier Tenaillon, and Tal Pupko. All three reviewers are members of the Biology Direct editorial board. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0172-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, 98109, WA, USA.
| |
Collapse
|
59
|
Thiltgen G, Dos Reis M, Goldstein RA. Finding Direction in the Search for Selection. J Mol Evol 2016; 84:39-50. [PMID: 27913840 PMCID: PMC5253163 DOI: 10.1007/s00239-016-9765-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 11/10/2016] [Indexed: 11/24/2022]
Abstract
Tests for positive selection have mostly been developed to look for diversifying selection where change away from the current amino acid is often favorable. However, in many cases we are interested in directional selection where there is a shift toward specific amino acids, resulting in increased fitness in the species. Recently, a few methods have been developed to detect and characterize directional selection on a molecular level. Using the results of evolutionary simulations as well as HIV drug resistance data as models of directional selection, we compare two such methods with each other, as well as against a standard method for detecting diversifying selection. We find that the method to detect diversifying selection also detects directional selection under certain conditions. One method developed for detecting directional selection is powerful and accurate for a wide range of conditions, while the other can generate an excessive number of false positives.
Collapse
Affiliation(s)
- Grant Thiltgen
- Institute of Child Health, University College London, London, UK
| | - Mario Dos Reis
- The School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | | |
Collapse
|
60
|
Rodrigue N, Lartillot N. Detecting Adaptation in Protein-Coding Genes Using a Bayesian Site-Heterogeneous Mutation-Selection Codon Substitution Model. Mol Biol Evol 2016; 34:204-214. [PMID: 27744408 PMCID: PMC5854120 DOI: 10.1093/molbev/msw220] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Codon substitution models have traditionally attempted to uncover signatures of adaptation within protein-coding genes by contrasting the rates of synonymous and non-synonymous substitutions. Another modeling approach, known as the mutation–selection framework, attempts to explicitly account for selective patterns at the amino acid level, with some approaches allowing for heterogeneity in these patterns across codon sites. Under such a model, substitutions at a given position occur at the neutral or nearly neutral rate when they are synonymous, or when they correspond to replacements between amino acids of similar fitness; substitutions from high to low (low to high) fitness amino acids have comparatively low (high) rates. Here, we study the use of such a mutation–selection framework as a null model for the detection of adaptation. Following previous works in this direction, we include a deviation parameter that has the effect of capturing the surplus, or deficit, in non-synonymous rates, relative to what would be expected under a mutation–selection modeling framework that includes a Dirichlet process approach to account for across-codon-site variation in amino acid fitness profiles. We use simulations, along with a few real data sets, to study the behavior of the approach, and find it to have good power with a low false-positive rate. Altogether, we emphasize the potential of recent mutation–selection models in the detection of adaptation, calling for further model refinements as well as large-scale applications.
Collapse
Affiliation(s)
- Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, Ottawa, Canada
| | - Nicolas Lartillot
- Université de Lyon, Laboratoire de Biométrie, Biologie Évolutive, Villeurbanne, France
| |
Collapse
|
61
|
Spielman SJ, Wan S, Wilke CO. A Comparison of One-Rate and Two-Rate Inference Frameworks for Site-Specific dN/dS Estimation. Genetics 2016; 204:499-511. [PMID: 27535929 PMCID: PMC5068842 DOI: 10.1534/genetics.115.185264] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 08/11/2016] [Indexed: 11/18/2022] Open
Abstract
Two broad paradigms exist for inferring [Formula: see text] the ratio of nonsynonymous to synonymous substitution rates, from coding sequences: (i) a one-rate approach, where [Formula: see text] is represented with a single parameter, or (ii) a two-rate approach, where [Formula: see text] and [Formula: see text] are estimated separately. The performances of these two approaches have been well studied in the specific context of proper model specification, i.e., when the inference model matches the simulation model. By contrast, the relative performances of one-rate vs. two-rate parameterizations when applied to data generated according to a different mechanism remain unclear. Here, we compare the relative merits of one-rate and two-rate approaches in the specific context of model misspecification by simulating alignments with mutation-selection models rather than with [Formula: see text]-based models. We find that one-rate frameworks generally infer more accurate [Formula: see text] point estimates, even when [Formula: see text] varies among sites. In other words, modeling [Formula: see text] variation may substantially reduce accuracy of [Formula: see text] point estimates. These results appear to depend on the selective constraint operating at a given site. For sites under strong purifying selection ([Formula: see text]), one-rate and two-rate models show comparable performances. However, one-rate models significantly outperform two-rate models for sites under moderate-to-weak purifying selection. We attribute this distinction to the fact that, for these more quickly evolving sites, a given substitution is more likely to be nonsynonymous than synonymous. The data will therefore be relatively enriched for nonsynonymous changes, and modeling [Formula: see text] contributes excessive noise to [Formula: see text] estimates. We additionally find that high levels of divergence among sequences, rather than the number of sequences in the alignment, are more critical for obtaining precise point estimates.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas, Austin, Texas 78712
| | - Suyang Wan
- School of Physics and Astronomy, The University of Minnesota, Minneapolis, Minnesota 55455
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas, Austin, Texas 78712
| |
Collapse
|
62
|
Cheng RR, Nordesjö O, Hayes RL, Levine H, Flores SC, Onuchic JN, Morcos F. Connecting the Sequence-Space of Bacterial Signaling Proteins to Phenotypes Using Coevolutionary Landscapes. Mol Biol Evol 2016; 33:3054-3064. [PMID: 27604223 PMCID: PMC5100047 DOI: 10.1093/molbev/msw188] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Two-component signaling (TCS) is the primary means by which bacteria sense and respond to the environment. TCS involves two partner proteins working in tandem, which interact to perform cellular functions whereas limiting interactions with non-partners (i.e., cross-talk). We construct a Potts model for TCS that can quantitatively predict how mutating amino acid identities affect the interaction between TCS partners and non-partners. The parameters of this model are inferred directly from protein sequence data. This approach drastically reduces the computational complexity of exploring the sequence-space of TCS proteins. As a stringent test, we compare its predictions to a recent comprehensive mutational study, which characterized the functionality of 204 mutational variants of the PhoQ kinase in Escherichia coli We find that our best predictions accurately reproduce the amino acid combinations found in experiment, which enable functional signaling with its partner PhoP. These predictions demonstrate the evolutionary pressure to preserve the interaction between TCS partners as well as prevent unwanted cross-talk. Further, we calculate the mutational change in the binding affinity between PhoQ and PhoP, providing an estimate to the amount of destabilization needed to disrupt TCS.
Collapse
Affiliation(s)
- R R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, TX
| | - O Nordesjö
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - R L Hayes
- Department of Biophysics, University of Michigan, Ann Arbor, MI
| | - H Levine
- Center for Theoretical Biological Physics, Rice University, Houston, TX.,Department of Bioengineering, Rice University, Houston, TX
| | - S C Flores
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - J N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX .,Department of Physics and Astronomy, Rice University, Houston, TX.,Department of Chemistry, and Biosciences, Rice University, Houston, TX
| | - F Morcos
- Department of Biological Sciences and Center for Systems Biology, University of Texas at Dallas, Dallas, TX
| |
Collapse
|
63
|
Pouyet F, Bailly-Bechet M, Mouchiroud D, Guéguen L. SENCA: A Multilayered Codon Model to Study the Origins and Dynamics of Codon Usage. Genome Biol Evol 2016; 8:2427-41. [PMID: 27401173 PMCID: PMC5010899 DOI: 10.1093/gbe/evw165] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Gene sequences are the target of evolution operating at different levels, including the nucleotide, codon, and amino acid levels. Disentangling the impact of those different levels on gene sequences requires developing a probabilistic model with three layers. Here we present SENCA (site evolution of nucleotides, codons, and amino acids), a codon substitution model that separately describes 1) nucleotide processes which apply on all sites of a sequence such as the mutational bias, 2) preferences between synonymous codons, and 3) preferences among amino acids. We argue that most synonymous substitutions are not neutral and that SENCA provides more accurate estimates of selection compared with more classical codon sequence models. We study the forces that drive the genomic content evolution, intraspecifically in the core genome of 21 prokaryotes and interspecifically for five Enterobacteria. We retrieve the existence of a universal mutational bias toward AT, and that taking into account selection on synonymous codon usage has consequences on the measurement of selection on nonsynonymous substitutions. We also confirm that codon usage bias is mostly driven by selection on preferred codons. We propose new summary statistics to measure the relative importance of the different evolutionary processes acting on sequences.
Collapse
Affiliation(s)
- Fanny Pouyet
- Laboratoire de Biologie et Biométrie Evolutive, University Claude Bernard Lyon 1-University of Lyon, Villeurbanne, France
| | - Marc Bailly-Bechet
- Laboratoire de Biologie et Biométrie Evolutive, University Claude Bernard Lyon 1-University of Lyon, Villeurbanne, France
| | - Dominique Mouchiroud
- Laboratoire de Biologie et Biométrie Evolutive, University Claude Bernard Lyon 1-University of Lyon, Villeurbanne, France
| | - Laurent Guéguen
- Laboratoire de Biologie et Biométrie Evolutive, University Claude Bernard Lyon 1-University of Lyon, Villeurbanne, France
| |
Collapse
|
64
|
Spielman SJ, Wilke CO. Extensively Parameterized Mutation-Selection Models Reliably Capture Site-Specific Selective Constraint. Mol Biol Evol 2016; 33:2990-3002. [PMID: 27512115 DOI: 10.1093/molbev/msw171] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The mutation-selection model of coding sequence evolution has received renewed attention for its use in estimating site-specific amino acid propensities and selection coefficient distributions. Two computationally tractable mutation-selection inference frameworks have been introduced: One framework employs a fixed-effects, highly parameterized maximum likelihood approach, whereas the other employs a random-effects Bayesian Dirichlet Process approach. While both implementations follow the same model, they appear to make distinct predictions about the distribution of selection coefficients. The fixed-effects framework estimates a large proportion of highly deleterious substitutions, whereas the random-effects framework estimates that all substitutions are either nearly neutral or weakly deleterious. It remains unknown, however, how accurately each method infers evolutionary constraints at individual sites. Indeed, selection coefficient distributions pool all site-specific inferences, thereby obscuring a precise assessment of site-specific estimates. Therefore, in this study, we use a simulation-based strategy to determine how accurately each approach recapitulates the selective constraint at individual sites. We find that the fixed-effects approach, despite its extensive parameterization, consistently and accurately estimates site-specific evolutionary constraint. By contrast, the random-effects Bayesian approach systematically underestimates the strength of natural selection, particularly for slowly evolving sites. We also find that, despite the strong differences between their inferred selection coefficient distributions, the fixed- and random-effects approaches yield surprisingly similar inferences of site-specific selective constraint. We conclude that the fixed-effects mutation-selection framework provides the more reliable software platform for model application and future development.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX Present address: Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX
| |
Collapse
|
65
|
Jackson EL, Shahmoradi A, Spielman SJ, Jack BR, Wilke CO. Intermediate divergence levels maximize the strength of structure-sequence correlations in enzymes and viral proteins. Protein Sci 2016; 25:1341-53. [PMID: 26971720 PMCID: PMC4918415 DOI: 10.1002/pro.2920] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 03/04/2016] [Indexed: 12/16/2022]
Abstract
Structural properties such as solvent accessibility and contact number predict site-specific sequence variability in many proteins. However, the strength and significance of these structure-sequence relationships vary widely among different proteins, with absolute correlation strengths ranging from 0 to 0.8. In particular, two recent works have made contradictory observations. Yeh et al. (Mol. Biol. Evol. 31:135-139, 2014) found that both relative solvent accessibility (RSA) and weighted contact number (WCN) are good predictors of sitewise evolutionary rate in enzymes, with WCN clearly out-performing RSA. Shahmoradi et al. (J. Mol. Evol. 79:130-142, 2014) considered these same predictors (as well as others) in viral proteins and found much weaker correlations and no clear advantage of WCN over RSA. Because these two studies had substantial methodological differences, however, a direct comparison of their results is not possible. Here, we reanalyze the datasets of the two studies with one uniform analysis pipeline, and we find that many apparent discrepancies between the two analyses can be attributed to the extent of sequence divergence in individual alignments. Specifically, the alignments of the enzyme dataset are much more diverged than those of the virus dataset, and proteins with higher divergence exhibit, on average, stronger structure-sequence correlations. However, the highest structure-sequence correlations are observed at intermediate divergence levels, where both highly conserved and highly variable sites are present in the same alignment.
Collapse
Affiliation(s)
- Eleisha L Jackson
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| | - Amir Shahmoradi
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
- Department of Physics, The University of Texas at Austin, Austin, Texas, 78712
| | - Stephanie J Spielman
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| | - Benjamin R Jack
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| |
Collapse
|
66
|
van Schooten B, Jiggins CD, Briscoe AD, Papa R. Genome-wide analysis of ionotropic receptors provides insight into their evolution in Heliconius butterflies. BMC Genomics 2016; 17:254. [PMID: 27004525 PMCID: PMC4804616 DOI: 10.1186/s12864-016-2572-y] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 03/07/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In a world of chemical cues, smell and taste are essential senses for survival. Here we focused on Heliconius, a diverse group of butterflies that exhibit variation in pre- and post-zygotic isolation and chemically-mediated behaviors across their phylogeny. Our study examined the ionotropic receptors, a recently discovered class of receptors that are some of the most ancient chemical receptors. RESULTS We found more ionotropic receptors in Heliconius (31) than in Bombyx mori (25) or in Danaus plexippus (27). Sixteen genes in Lepidoptera were not present in Diptera. Only IR7d4 was exclusively found in butterflies and two expansions of IR60a were exclusive to Heliconius. A genome-wide comparison between 11 Heliconius species revealed instances of pseudogenization, gene gain, and signatures of positive selection across the phylogeny. IR60a2b and IR60a2d are unique to the H. melpomene, H. cydno, and H. timareta clade, a group where chemosensing is likely involved in pre-zygotic isolation. IR60a2b also displayed copy number variations (CNVs) in distinct populations of H. melpomene and was the only gene significantly higher expressed in legs and mouthparts than in antennae, which suggests a gustatory function. dN/dS analysis suggests more frequent positive selection in some intronless IR genes and in particular in the sara/sapho and melpomene/cydno/timareta clades. IR60a1 was the only gene with an elevated dN/dS along a major phylogenetic branch associated with pupal mating. Only IR93a was differentially expressed between sexes. CONCLUSIONS All together these data make Heliconius butterflies one of the very few insects outside Drosophila where IRs have been characterized in detail. Our work outlines a dynamic pattern of IR gene evolution throughout the Heliconius radiation which could be the result of selective pressure to find potential mates or host-plants.
Collapse
Affiliation(s)
- Bas van Schooten
- Department of Biology, University of Puerto Rico, Rio Piedras, San Juan, Puerto Rico.
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge, UK
| | - Adriana D Briscoe
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, 92697, USA
| | - Riccardo Papa
- Department of Biology, University of Puerto Rico, Rio Piedras, San Juan, Puerto Rico
| |
Collapse
|
67
|
Dos Reis M. How to calculate the non-synonymous to synonymous rate ratio of protein-coding genes under the Fisher-Wright mutation-selection framework. Biol Lett 2016; 11:20141031. [PMID: 25854546 DOI: 10.1098/rsbl.2014.1031] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
First principles of population genetics are used to obtain formulae relating the non-synonymous to synonymous substitution rate ratio to the selection coefficients acting at codon sites in protein-coding genes. Two theoretical cases are discussed and two examples from real data (a chloroplast gene and a virus polymerase) are given. The formulae give much insight into the dynamics of non-synonymous substitutions and may inform the development of methods to detect adaptive evolution.
Collapse
Affiliation(s)
- Mario Dos Reis
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
68
|
Selection maintaining protein stability at equilibrium. J Theor Biol 2016; 391:21-34. [DOI: 10.1016/j.jtbi.2015.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 11/29/2015] [Accepted: 12/01/2015] [Indexed: 11/24/2022]
|
69
|
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 2016; 17:109-21. [PMID: 26781812 DOI: 10.1038/nrg.2015.18] [Citation(s) in RCA: 176] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina
| | - Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
70
|
Duchêne S, Di Giallonardo F, Holmes EC. Substitution Model Adequacy and Assessing the Reliability of Estimates of Virus Evolutionary Rates and Time Scales. Mol Biol Evol 2015; 33:255-67. [PMID: 26416981 DOI: 10.1093/molbev/msv207] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Determining the time scale of virus evolution is central to understanding their origins and emergence. The phylogenetic methods commonly used for this purpose can be misleading if the substitution model makes incorrect assumptions about the data. Empirical studies consider a pool of models and select that with the highest statistical fit. However, this does not allow the rejection of all models, even if they poorly describe the data. An alternative is to use model adequacy methods that evaluate the ability of a model to predict hypothetical future observations. This can be done by comparing the empirical data with data generated under the model in question. We conducted simulations to evaluate the sensitivity of such methods with nucleotide, amino acid, and codon data. These effectively detected underparameterized models, but failed to detect mutational saturation and some instances of nonstationary base composition, which can lead to biases in estimates of tree topology and length. To test the applicability of these methods with real data, we analyzed nucleotide and amino acid data sets from the genus Flavivirus of RNA viruses. In most cases these models were inadequate, with the exception of a data set of relatively closely related sequences of Dengue virus, for which the GTR+Γ nucleotide and LG+Γ amino acid substitution models were adequate. Our results partly explain the lack of consensus over estimates of the long-term evolutionary time scale of these viruses, and indicate that assessing the adequacy of substitution models should be routinely used to determine whether estimates are reliable.
Collapse
Affiliation(s)
- Sebastián Duchêne
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Biological Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW, Australia
| | - Francesca Di Giallonardo
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Biological Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW, Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Biological Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
71
|
Spielman SJ, Wilke CO. Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies. PLoS One 2015; 10:e0139047. [PMID: 26397960 PMCID: PMC4580465 DOI: 10.1371/journal.pone.0139047] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 09/07/2015] [Indexed: 11/19/2022] Open
Abstract
We introduce Pyvolve, a flexible Python module for simulating genetic data along a phylogeny using continuous-time Markov models of sequence evolution. Easily incorporated into Python bioinformatics pipelines, Pyvolve can simulate sequences according to most standard models of nucleotide, amino-acid, and codon sequence evolution. All model parameters are fully customizable. Users can additionally specify custom evolutionary models, with custom rate matrices and/or states to evolve. This flexibility makes Pyvolve a convenient framework not only for simulating sequences under a wide variety of conditions, but also for developing and testing new evolutionary models. Pyvolve is an open-source project under a FreeBSD license, and it is available for download, along with a detailed user-manual and example scripts, from http://github.com/sjspielman/pyvolve.
Collapse
Affiliation(s)
- Stephanie J. Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, United States of America
| | - Claus O. Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, United States of America
| |
Collapse
|
72
|
Bay DC, Chan CS, Turner RJ. NarJ subfamily system specific chaperone diversity and evolution is directed by respiratory enzyme associations. BMC Evol Biol 2015; 15:110. [PMID: 26067063 PMCID: PMC4464133 DOI: 10.1186/s12862-015-0412-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 06/04/2015] [Indexed: 12/04/2022] Open
Abstract
Background Redox enzyme maturation proteins (REMPs) describe a diverse family of prokaryotic chaperones involved in the biogenesis of anaerobic complex iron sulfur molybdoenzyme (CISM) respiratory systems. Many REMP family studies have focused on NarJ subfamily members from Escherichia coli: NarJ, NarW, DmsD, TorD and YcdY. The aim of this bioinformatics study was to expand upon the evolution, distribution and genetic association of these 5 REMP members within 130 genome sequenced taxonomically diverse species representing 324 Prokaryotic sequences. NarJ subfamily member diversity was examined at the phylum-species level and at the amino acid/nucleotide level to determine how close their genetic associations were between their respective CISM systems within phyla. Results This study revealed that NarJ members possessed unique motifs that distinguished Gram-negative from Gram-positive/Archaeal species and identified a strict genetic association with its nitrate reductase complex (narGHI) operon compared to all other members. NarW appears to be found specifically in Gammaproteobacteria. DmsD also showed close associations with the dimethylsulfoxide reductase (dmsABC) operon compared to TorD. Phylogenetic analysis revealed that YcdY has recently evolved from DmsD and that YcdY has likely diverged into 2 subfamilies linked to Zn- dependent alkaline phosphatase (ycdX) operons and a newly identified operon containing part of Zn-metallopeptidase FtsH complex component (hflC) and NADH-quinone dehydrogenase (mdaB). TorD demonstrated the greatest diversity in operon association. TorD was identifed within operons from either trimethylamine-N-oxide reductase (torAC) or formate dehydrogenase (fdhGHI), where each type of TorD had a unique motif. Additionally a subgroup of dmsD and torD members were also linked to operons with biotin sulfoxide (bisC) and polysulfide reductase (nrfD) indicating a potential role in the maturation of diverse CISM. Conclusion Examination of diverse prokaryotic NarJ subfamily members demonstrates that the evolution and genetic association of each member is uniquely biased by its CISM operon association. Electronic supplementary material The online version of this article (doi:10.1186/s12862-015-0412-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Denice C Bay
- Department of Biological Sciences, University of Calgary, Rm 156 Biological Science Bldg., 2500 University Dr. NW, Calgary, T2N 1 N4, AB, Canada.
| | - Catherine S Chan
- Department of Biological Sciences, University of Calgary, Rm 156 Biological Science Bldg., 2500 University Dr. NW, Calgary, T2N 1 N4, AB, Canada.
| | - Raymond J Turner
- Department of Biological Sciences, University of Calgary, Rm 156 Biological Science Bldg., 2500 University Dr. NW, Calgary, T2N 1 N4, AB, Canada.
| |
Collapse
|
73
|
Selective Advantages of a Parasexual Cycle for the Yeast Candida albicans. Genetics 2015; 200:1117-32. [PMID: 26063661 DOI: 10.1534/genetics.115.177170] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 06/08/2015] [Indexed: 11/18/2022] Open
Abstract
The yeast Candida albicans can mate. However, in the natural environment mating may generate progeny (fusants) fitter than clonal lineages too rarely to render mating biologically significant: C. albicans has never been observed to mate in its natural environment, the human host, and the population structure of the species is largely clonal. It seems incapable of meiosis, and most isolates are diploid and carry both mating-type-like (MTL) locus alleles, preventing mating. Only chromosome loss or localized loss of heterozygosity can generate mating-competent cells, and recombination of parental alleles is limited. To determine if mating is a biologically significant process, we investigated if mating is under selection. The ratio of nonsynonymous to synonymous mutations in mating genes and the frequency of mutations abolishing mating indicated that mating is under selection. The MTL locus is located on chromosome 5, and when we induced chromosome 5 loss in 10 clinical isolates, most of the resulting MTL-homozygotes could mate with each other, producing fusants. In laboratory culture, a novel environment favoring novel genotypes, some fusants grew faster than their parents, in which loss of heterozygosity had reduced growth rates, and also faster than their MTL-heterozygous ancestors-albeit often only after serial propagation. In a small number of experiments in which co-inoculation of an oral colonization model with MTL-homozygotes yielded small numbers of fusants, their numbers declined over time relative to those of the parents. Overall, our results indicate that mating generates genotypes superior to existing MTL-heterozygotes often enough to be under selection.
Collapse
|
74
|
Gilchrist MA, Chen WC, Shah P, Landerer CL, Zaretzki R. Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone. Genome Biol Evol 2015; 7:1559-79. [PMID: 25977456 PMCID: PMC4494061 DOI: 10.1093/gbe/evv087] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels ([Formula: see text] in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated ([Formula: see text]). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time ([Formula: see text]), and mRNA and ribosome profiling footprint-based estimates of gene expression ([Formula: see text]) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid's "optimal" codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.
Collapse
Affiliation(s)
- Michael A Gilchrist
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville National Institute for Mathematical and Biological Synthesis, Knoxville, Tennessee
| | - Wei-Chen Chen
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland
| | - Premal Shah
- Department of Biology, University of Pennsylvania
| | - Cedric L Landerer
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville
| | - Russell Zaretzki
- National Institute for Mathematical and Biological Synthesis, Knoxville, Tennessee Department of Business Analytics and Statistics, University of Tennessee, Knoxville
| |
Collapse
|
75
|
Meyer AG, Spielman SJ, Bedford T, Wilke CO. Time dependence of evolutionary metrics during the 2009 pandemic influenza virus outbreak. Virus Evol 2015; 1:vev006. [PMID: 26770819 PMCID: PMC4710376 DOI: 10.1093/ve/vev006] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
With the expansion of DNA sequencing technology, quantifying evolution in emerging viral outbreaks has become an important tool for scientists and public health officials. Although it is known that the degree of sequence divergence significantly affects the calculation of evolutionary metrics in viral outbreaks, the extent and duration of this effect during an actual outbreak remains unclear. We have analyzed how limited divergence time during an early viral outbreak affects the accuracy of molecular evolutionary metrics. Using sequence data from the first 25 months of the 2009 pandemic H1N1 (pH1N1) outbreak, we calculated each of three different standard evolutionary metrics-molecular clock rate (i.e., evolutionary rate), whole gene dN/dS, and site-wise dN/dS-for hemagglutinin and neuraminidase, using increasingly longer time windows, from 1 month to 25 months. For the molecular clock rate, we found that at least three to four months of temporal divergence from the start of sampling was required to make precise estimates that also agreed with long-term values. For whole gene dN/dS, we found that at least two months of data were required to generate precise estimates, but six to nine months were required for estimates to approach their long term values. For site-wise dN/dS estimates, we found that at least six months of sampling divergence was required before the majority of sites had at least one mutation and were thus evolutionarily informative. Furthermore, eight months of sampling divergence was required before the site-wise estimates appropriately reflected the distribution of values expected from known protein-structure-based evolutionary pressure in influenza. In summary, we found that evolutionary metrics calculated from gene sequence data in early outbreaks should be expected to deviate from their long-term estimates for at least several months after the initial emergence and sequencing of the virus.
Collapse
Affiliation(s)
- Austin G. Meyer
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA, 78712
- School of Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA, 79430
| | - Stephanie J. Spielman
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA, 78712
| | - Trevor Bedford
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA, 98109
| | - Claus O. Wilke
- Department of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX, USA, 78712
| |
Collapse
|