1
|
Savage AM, Willmott MJ, Moreno‐García P, Jagiello Z, Li D, Malesis A, Miles LS, Román‐Palacios C, Salazar‐Valenzuela D, Verrelli BC, Winchell KM, Alberti M, Bonilla‐Bedoya S, Carlen E, Falvey C, Johnson L, Martin E, Kuzyo H, Marzluff J, Munshi‐South J, Phifer‐Rixey M, Stadnicki I, Szulkin M, Zhou Y, Gotanda KM. Online toolkits for collaborative and inclusive global research in urban evolutionary ecology. Ecol Evol 2024; 14:e11633. [PMID: 38919647 PMCID: PMC11197044 DOI: 10.1002/ece3.11633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 06/05/2024] [Accepted: 06/12/2024] [Indexed: 06/27/2024] Open
Abstract
Urban evolutionary ecology is inherently interdisciplinary. Moreover, it is a field with global significance. However, bringing researchers and resources together across fields and countries is challenging. Therefore, an online collaborative research hub, where common methods and best practices are shared among scientists from diverse geographic, ethnic, and career backgrounds would make research focused on urban evolutionary ecology more inclusive. Here, we describe a freely available online research hub for toolkits that facilitate global research in urban evolutionary ecology. We provide rationales and descriptions of toolkits for: (1) decolonizing urban evolutionary ecology; (2) identifying and fostering international collaborative partnerships; (3) common methods and freely-available datasets for trait mapping across cities; (4) common methods and freely-available datasets for cross-city evolutionary ecology experiments; and (5) best practices and freely available resources for public outreach and communication of research findings in urban evolutionary ecology. We outline how the toolkits can be accessed, archived, and modified over time in order to sustain long-term global research that will advance our understanding of urban evolutionary ecology.
Collapse
Affiliation(s)
- Amy M. Savage
- Department of Biology & Center for Computational and Integrative BiologyRutgers University – CamdenCamdenNew JerseyUSA
| | - Meredith J. Willmott
- Department of Biology & Center for Computational and Integrative BiologyRutgers University – CamdenCamdenNew JerseyUSA
| | - Pablo Moreno‐García
- Department of Biological Sciences, Center for Computation & TechnologyLouisiana State UniversityBaton RougeLouisianaUSA
| | - Zuzanna Jagiello
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research CentreUniversity of WarsawWarsawPoland
| | - Daijiang Li
- Department of Biological Sciences, Center for Computation & TechnologyLouisiana State UniversityBaton RougeLouisianaUSA
| | - Anna Malesis
- Department of Urban Design and PlanningUniversity of WashingtonSeattleWashingtonUSA
| | - Lindsay S. Miles
- Virginia Polytechnic and State UniversityEntomology DepartmentBlacksburgVirginiaUSA
| | | | - David Salazar‐Valenzuela
- Centro de Investigación de la Biodiversidad y Cambio Climático & Facultad de Ciencias de Medio AmbienteUniversidad IndoaméricaQuitoEcuador
| | - Brian C. Verrelli
- Center for Biological Data ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| | | | - Marina Alberti
- Department of Urban Design and PlanningUniversity of WashingtonSeattleWashingtonUSA
| | | | - Elizabeth Carlen
- Department of BiologyWashington University of St. LouisSt. LouisMissouriUSA
| | - Cleo Falvey
- Department of Biology & Center for Computational and Integrative BiologyRutgers University – CamdenCamdenNew JerseyUSA
| | - Lauren Johnson
- Department of BiologyWashington University of St. LouisSt. LouisMissouriUSA
| | - Ella Martin
- Ecology and Evolutionary BiologyUniversity of TorontoTorontoOntarioCanada
| | - Hanna Kuzyo
- Frankfurt Zoological SocietyFrankfurtGermany
| | - John Marzluff
- Department of Urban Design and PlanningUniversity of WashingtonSeattleWashingtonUSA
| | - Jason Munshi‐South
- Louis Calder Center & Department of Biological SciencesFordham UniversityArmonkNew YorkUSA
| | | | - Ignacy Stadnicki
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research CentreUniversity of WarsawWarsawPoland
| | - Marta Szulkin
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research CentreUniversity of WarsawWarsawPoland
| | - Yuyu Zhou
- Department of Geological and Atmospheric SciencesIowa State UniversityAmesIowaUSA
| | - Kiyoko M. Gotanda
- Department of Biological SciencesBrock UniversitySt. CatharinesOntarioCanada
| |
Collapse
|
2
|
Molteni C, Forni D, Cagliani R, Arrigoni F, Pozzoli U, De Gioia L, Sironi M. Selective events at individual sites underlie the evolution of monkeypox virus clades. Virus Evol 2023; 9:vead031. [PMID: 37305708 PMCID: PMC10256197 DOI: 10.1093/ve/vead031] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 03/31/2023] [Accepted: 05/12/2023] [Indexed: 06/13/2023] Open
Abstract
In endemic regions (West Africa and the Congo Basin), the genetic diversity of monkeypox virus (MPXV) is geographically structured into two major clades (Clades I and II) that differ in virulence and host associations. Clade IIb is closely related to the B.1 lineage, which is dominating a worldwide outbreak initiated in 2022. Lineage B.1 has however accumulated mutations of unknown significance that most likely result from apolipoprotein B mRNA editing catalytic polypeptide-like 3 (APOBEC3) editing. We applied a population genetics-phylogenetics approach to investigate the evolution of MPXV during historical viral spread in Africa and to infer the distribution of fitness effects. We observed a high preponderance of codons evolving under strong purifying selection, particularly in viral genes involved in morphogenesis and replication or transcription. However, signals of positive selection were also detected and were enriched in genes involved in immunomodulation and/or virulence. In particular, several genes showing evidence of positive selection were found to hijack different steps of the cellular pathway that senses cytosolic DNA. Also, a few selected sites in genes that are not directly involved in immunomodulation are suggestive of antibody escape or other immune-mediated pressures. Because orthopoxvirus host range is primarily determined by the interaction with the host immune system, we suggest that the positive selection signals represent signatures of host adaptation and contribute to the different virulence of Clade I and II MPXVs. We also used the calculated selection coefficients to infer the effects of mutations that define the predominant human MPXV1 (hMPXV1) lineage B.1, as well as the changes that have been accumulating during the worldwide outbreak. Results indicated that a proportion of deleterious mutations were purged from the predominant outbreak lineage, whose spread was not driven by the presence of beneficial changes. Polymorphic mutations with a predicted beneficial effect on fitness are few and have a low frequency. It remains to be determined whether they have any significance for ongoing virus evolution.
Collapse
Affiliation(s)
- Cristian Molteni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Via don Luigi Monza, Bosisio Parini 23842, Italy
| | - Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Via don Luigi Monza, Bosisio Parini 23842, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Via don Luigi Monza, Bosisio Parini 23842, Italy
| | - Federica Arrigoni
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Piazza della scienza, Milan 20126, Italy
| | - Uberto Pozzoli
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Via don Luigi Monza, Bosisio Parini 23842, Italy
| | - Luca De Gioia
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Piazza della scienza, Milan 20126, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Via don Luigi Monza, Bosisio Parini 23842, Italy
| |
Collapse
|
3
|
Li NK, Corander J, Grad YH, Chang HH. Discovering recent selection forces shaping the evolution of dengue viruses based on polymorphism data across geographic scales. Virus Evol 2022; 8:veac108. [PMID: 36601300 PMCID: PMC9789396 DOI: 10.1093/ve/veac108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 09/23/2022] [Accepted: 11/28/2022] [Indexed: 11/30/2022] Open
Abstract
Incomplete selection makes it challenging to infer selection on genes at short time scales, especially for microorganisms, due to stronger linkage between loci. However, in many cases, the selective force changes with environment, time, or other factors, and it is of great interest to understand selective forces at this level to answer relevant biological questions. We developed a new method that uses the change in dN /dS , instead of the absolute value of dN /dS , to infer the dominating selective force based on sequence data across geographical scales. If a gene was under positive selection, dN /dS was expected to increase through time, whereas if a gene was under negative selection, dN /dS was expected to decrease through time. Assuming that the migration rate decreased and the divergence time between samples increased from between-continent, within-continent different-country, to within-country level, dN /dS of a gene dominated by positive selection was expected to increase with increasing geographical scales, and the opposite trend was expected in the case of negative selection. Motivated by the McDonald-Kreitman (MK) test, we developed a pairwise MK test to assess the statistical significance of detected trends in dN /dS . Application of the method to a global sample of dengue virus genomes identified multiple significant signatures of selection in both the structural and non-structural proteins. Because this method does not require allele frequency estimates and uses synonymous mutations for comparison, it is less prone to sampling error, providing a way to infer selection forces within species using publicly available genomic data from locations over broad geographical scales.
Collapse
Affiliation(s)
- Nien-Kung Li
- Department of Life Science & Institute of Bioinformatics and Structural Biology, National Tsing Hua University, 101, Section 2, Kuang-Fu Road, Hsinchu 300044, Taiwan
| | - Jukka Corander
- Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, Yliopistonkatu 3, Helsinki 00014, Finland,Department of Biostatistics, University of Oslo, Domus Medica Gaustad Sognsvannsveien 9, Oslo 0372, Norway,Parasites and Microbes, The Wellcome Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | | | | |
Collapse
|
4
|
Li T, Wong TKF, Ranjard L, Rodrigo AG. pgHMA: Application of the heteroduplex mobility assay analysis in phylogenetics and population genetics. Mol Ecol Resour 2021; 22:653-663. [PMID: 34551204 DOI: 10.1111/1755-0998.13508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 09/01/2021] [Accepted: 09/06/2021] [Indexed: 11/26/2022]
Abstract
The heteroduplex mobility assay (HMA) has proven to be a robust tool for the detection of genetic variation. Here, we describe a simple and rapid application of the HMA by microfluidic capillary electrophoresis, for phylogenetics and population genetic analyses (pgHMA). We show how commonly applied techniques in phylogenetics and population genetics have equivalents with pgHMA: phylogenetic reconstruction with bootstrapping, skyline plots, and mismatch distribution analysis. We assess the performance and accuracy of pgHMA by comparing the results obtained against those obtained using standard methods of analyses applied to sequencing data. The resulting comparisons demonstrate that: (a) there is a significant linear relationship (R2 = .992) between heteroduplex mobility and genetic distance, (b) phylogenetic trees obtained by HMA and nucleotide sequences present nearly identical topologies, (c) clades with high pgHMA parametric bootstrap support also have high bootstrap support on nucleotide phylogenies, (d) skyline plots estimated from the UPGMA trees of HMA and Bayesian trees of nucleotide data reveal similar trends, especially for the median trend estimate of effective population size, and (e) optimized mismatch distributions of HMA are closely fitted to the mismatch distributions of nucleotide sequences. In summary, pgHMA is an easily-applied method for approximating phylogenetic diversity and population trends.
Collapse
Affiliation(s)
- Teng Li
- Research School of Biology, Australian National University, Canberra, ACT, Australia.,School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Thomas K F Wong
- Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Louis Ranjard
- Research School of Biology, Australian National University, Canberra, ACT, Australia.,PlantTech Research Institute, Tauranga, New Zealand
| | - Allen G Rodrigo
- Research School of Biology, Australian National University, Canberra, ACT, Australia.,School of Biological Sciences, University of Auckland, Auckland, New Zealand
| |
Collapse
|
5
|
Forni D, Cagliani R, Arrigoni F, Benvenuti M, Mozzi A, Pozzoli U, Clerici M, De Gioia L, Sironi M. Adaptation of the endemic coronaviruses HCoV-OC43 and HCoV-229E to the human host. Virus Evol 2021; 7:veab061. [PMID: 34527284 PMCID: PMC8344746 DOI: 10.1093/ve/veab061] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/18/2021] [Accepted: 06/23/2021] [Indexed: 12/29/2022] Open
Abstract
Four coronaviruses (HCoV-OC43, HCoV-HKU1, HCoV-NL63, and HCoV-229E) are endemic in human populations. All these viruses are seasonal and generate short-term immunity. Like the highly pathogenic coronaviruses, the endemic coronaviruses have zoonotic origins. Thus, understanding the evolutionary dynamics of these human viruses might provide insight into the future trajectories of SARS-CoV-2 evolution. Because the zoonotic sources of HCoV-OC43 and HCoV-229E are known, we applied a population genetics-phylogenetic approach to investigate which selective events accompanied the divergence of these viruses from the animal ones. Results indicated that positive selection drove the evolution of some accessory proteins, as well as of the membrane proteins. However, the spike proteins of both viruses and the hemagglutinin-esterase (HE) of HCoV-OC43 represented the major selection targets. Specifically, for both viruses, most positively selected sites map to the receptor-binding domains (RBDs) and are polymorphic. Molecular dating for the HCoV-229E spike protein indicated that RBD Classes I, II, III, and IV emerged 3-9 years apart. However, since the appearance of Class V (with much higher binding affinity), around 25 years ago, limited genetic diversity accumulated in the RBD. These different time intervals are not fully consistent with the hypothesis that HCoV-229E spike evolution was driven by antigenic drift. An alternative, not mutually exclusive possibility is that strains with higher affinity for the cellular receptor have out-competed strains with lower affinity. The evolution of the HCoV-OC43 spike protein was also suggested to undergo antigenic drift. However, we also found abundant signals of positive selection in HE. Whereas such signals might result from antigenic drift, as well, previous data showing co-evolution of the spike protein with HE suggest that optimization for human cell infection also drove the evolution of this virus. These data provide insight into the possible trajectories of SARS-CoV-2 evolution, especially in case the virus should become endemic.
Collapse
Affiliation(s)
- Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, via don Luigi Monza, 23843 Bosisio Parini, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, via don Luigi Monza, 23843 Bosisio Parini, Italy
| | - Federica Arrigoni
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Piazza della Scienza, Milan 20126, Italy
| | - Martino Benvenuti
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Piazza della Scienza, Milan 20126, Italy
| | - Alessandra Mozzi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, via don Luigi Monza, 23843 Bosisio Parini, Italy
| | - Uberto Pozzoli
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, via don Luigi Monza, 23843 Bosisio Parini, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, via Francesco Sforza, Milan 20122, Italy
| | - Luca De Gioia
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Piazza della Scienza, Milan 20126, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, via don Luigi Monza, 23843 Bosisio Parini, Italy
| |
Collapse
|
6
|
Latrille T, Lanore V, Lartillot N. Inferring long-term effective population size with Mutation-Selection Models. Mol Biol Evol 2021; 38:4573-4587. [PMID: 34191010 PMCID: PMC8476147 DOI: 10.1093/molbev/msab160] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Mutation–selection phylogenetic codon models are grounded on population genetics first principles and represent a principled approach for investigating the intricate interplay between mutation, selection, and drift. In their current form, mutation–selection codon models are entirely characterized by the collection of site-specific amino-acid fitness profiles. However, thus far, they have relied on the assumption of a constant genetic drift, translating into a unique effective population size (Ne) across the phylogeny, clearly an unrealistic assumption. This assumption can be alleviated by introducing variation in Ne between lineages. In addition to Ne, the mutation rate (μ) is susceptible to vary between lineages, and both should covary with life-history traits (LHTs). This suggests that the model should more globally account for the joint evolutionary process followed by all of these lineage-specific variables (Ne, μ, and LHTs). In this direction, we introduce an extended mutation–selection model jointly reconstructing in a Bayesian Monte Carlo framework the fitness landscape across sites and long-term trends in Ne, μ, and LHTs along the phylogeny, from an alignment of DNA coding sequences and a matrix of observed LHTs in extant species. The model was tested against simulated data and applied to empirical data in mammals, isopods, and primates. The reconstructed history of Ne in these groups appears to correlate with LHTs or ecological variables in a way that suggests that the reconstruction is reasonable, at least in its global trends. On the other hand, the range of variation in Ne inferred across species is surprisingly narrow. This last point suggests that some of the assumptions of the model, in particular concerning the assumed absence of epistatic interactions between sites, are potentially problematic.
Collapse
Affiliation(s)
- T Latrille
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, 5558, F-69622, Villeurbanne, France.,École Normale Supérieure de Lyon, Université de Lyon, Université Lyon 1, Lyon, France,
| | - V Lanore
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, 5558, F-69622, Villeurbanne, France
| | - N Lartillot
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, 5558, F-69622, Villeurbanne, France
| |
Collapse
|
7
|
De Maio N, Walker CR, Turakhia Y, Lanfear R, Corbett-Detig R, Goldman N. Mutation Rates and Selection on Synonymous Mutations in SARS-CoV-2. Genome Biol Evol 2021; 13:evab087. [PMID: 33895815 PMCID: PMC8135539 DOI: 10.1093/gbe/evab087] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/19/2021] [Indexed: 12/23/2022] Open
Abstract
The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G →U and C →U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. Although previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridgeshire, United Kingdom
| | - Conor R Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridgeshire, United Kingdom
- Department of Genetics, University of Cambridge, United Kingdom
| | - Yatish Turakhia
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT, Australia
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California, Santa Cruz, California, USA
- Genomics Institute, University of California, Santa Cruz, California, USA
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridgeshire, United Kingdom
| |
Collapse
|
8
|
Abstract
The dN/dS ratio provides evidence of adaptation or functional constraint in protein-coding genes by quantifying the relative excess or deficit of amino acid-replacing versus silent nucleotide variation. Inexpensive sequencing promises a better understanding of parameters, such as dN/dS, but analyzing very large data sets poses a major statistical challenge. Here, I introduce genomegaMap for estimating within-species genome-wide variation in dN/dS, and I apply it to 3,979 genes across 10,209 tuberculosis genomes to characterize the selection pressures shaping this global pathogen. GenomegaMap is a phylogeny-free method that addresses two major problems with existing approaches: 1) It is fast no matter how large the sample size and 2) it is robust to recombination, which causes phylogenetic methods to report artefactual signals of adaptation. GenomegaMap uses population genetics theory to approximate the distribution of allele frequencies under general, parent-dependent mutation models. Coalescent simulations show that substitution parameters are well estimated even when genomegaMap’s simplifying assumption of independence among sites is violated. I demonstrate the ability of genomegaMap to detect genuine signatures of selection at antimicrobial resistance-conferring substitutions in Mycobacterium tuberculosis and describe a novel signature of selection in the cold-shock DEAD-box protein A gene deaD/csdA. The genomegaMap approach helps accelerate the exploitation of big data for gaining new insights into evolution within species.
Collapse
Affiliation(s)
- Daniel J Wilson
- Big Data Institute, Nuffield Department of Population Health, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | | |
Collapse
|
9
|
Selberg AGA, Gaucher EA, Liberles DA. Ancestral Sequence Reconstruction: From Chemical Paleogenetics to Maximum Likelihood Algorithms and Beyond. J Mol Evol 2021; 89:157-164. [PMID: 33486547 PMCID: PMC7828096 DOI: 10.1007/s00239-021-09993-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Accepted: 01/04/2021] [Indexed: 12/13/2022]
Abstract
As both a computational and an experimental endeavor, ancestral sequence reconstruction remains a timely and important technique. Modern approaches to conduct ancestral sequence reconstruction for proteins are built upon a conceptual framework from journal founder Emile Zuckerkandl. On top of this, work on maximum likelihood phylogenetics published in Journal of Molecular Evolution in 1996 was one of the first approaches for generating maximum likelihood ancestral sequences of proteins. From its computational history, future model development needs as well as potential applications in areas as diverse as computational systems biology, molecular community ecology, infectious disease therapeutics and other biomedical applications, and biotechnology are discussed. From its past in this journal, there is a bright future for ancestral sequence reconstruction in the field of evolutionary biology.
Collapse
Affiliation(s)
- Avery G A Selberg
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - Eric A Gaucher
- Department of Biology, Georgia State University, Atlanta, GA, 30303, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA.
| |
Collapse
|
10
|
De Maio N, Walker CR, Turakhia Y, Lanfear R, Corbett-Detig R, Goldman N. Mutation rates and selection on synonymous mutations in SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.01.14.426705. [PMID: 33469589 PMCID: PMC7814826 DOI: 10.1101/2021.01.14.426705] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The COVID-19 pandemic has seen an unprecedented response from the sequencing community. Leveraging the sequence data from more than 140,000 SARS-CoV-2 genomes, we study mutation rates and selective pressures affecting the virus. Understanding the processes and effects of mutation and selection has profound implications for the study of viral evolution, for vaccine design, and for the tracking of viral spread. We highlight and address some common genome sequence analysis pitfalls that can lead to inaccurate inference of mutation rates and selection, such as ignoring skews in the genetic code, not accounting for recurrent mutations, and assuming evolutionary equilibrium. We find that two particular mutation rates, G→U and C→U, are similarly elevated and considerably higher than all other mutation rates, causing the majority of mutations in the SARS-CoV-2 genome, and are possibly the result of APOBEC and ROS activity. These mutations also tend to occur many times at the same genome positions along the global SARS-CoV-2 phylogeny (i.e., they are very homoplasic). We observe an effect of genomic context on mutation rates, but the effect of the context is overall limited. While previous studies have suggested selection acting to decrease U content at synonymous sites, we bring forward evidence suggesting the opposite.
Collapse
Affiliation(s)
- Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Conor R Walker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Yatish Turakhia
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert Lanfear
- Department of Ecology and Evolution, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Russell Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| |
Collapse
|
11
|
Cagliani R, Forni D, Clerici M, Sironi M. Computational Inference of Selection Underlying the Evolution of the Novel Coronavirus, Severe Acute Respiratory Syndrome Coronavirus 2. J Virol 2020; 94:e00411-20. [PMID: 32238584 PMCID: PMC7307108 DOI: 10.1128/jvi.00411-20] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 03/26/2020] [Indexed: 11/20/2022] Open
Abstract
The novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that recently emerged in China is thought to have a bat origin, as its closest known relative (BatCoV RaTG13) was described previously in horseshoe bats. We analyzed the selective events that accompanied the divergence of SARS-CoV-2 from BatCoV RaTG13. To this end, we applied a population genetics-phylogenetics approach, which leverages within-population variation and divergence from an outgroup. Results indicated that most sites in the viral open reading frames (ORFs) evolved under conditions of strong to moderate purifying selection. The most highly constrained sequences corresponded to some nonstructural proteins (nsps) and to the M protein. Conversely, nsp1 and accessory ORFs, particularly ORF8, had a nonnegligible proportion of codons evolving under conditions of very weak purifying selection or close to selective neutrality. Overall, limited evidence of positive selection was detected. The 6 bona fide positively selected sites were located in the N protein, in ORF8, and in nsp1. A signal of positive selection was also detected in the receptor-binding motif (RBM) of the spike protein but most likely resulted from a recombination event that involved the BatCoV RaTG13 sequence. In line with previous data, we suggest that the common ancestor of SARS-CoV-2 and BatCoV RaTG13 encoded/encodes an RBM similar to that observed in SARS-CoV-2 itself and in some pangolin viruses. It is presently unknown whether the common ancestor still exists and, if so, which animals it infects. Our data, however, indicate that divergence of SARS-CoV-2 from BatCoV RaTG13 was accompanied by limited episodes of positive selection, suggesting that the common ancestor of the two viruses was poised for human infection.IMPORTANCE Coronaviruses are dangerous zoonotic pathogens; in the last 2 decades, three coronaviruses have crossed the species barrier and caused human epidemics. One of these is the recently emerged SARS-CoV-2. We investigated how, since its divergence from a closely related bat virus, natural selection shaped the genome of SARS-CoV-2. We found that distinct coding regions in the SARS-CoV-2 genome evolved under conditions of different degrees of constraint and are consequently more or less prone to tolerate amino acid substitutions. In practical terms, the level of constraint provides indications about which proteins/protein regions are better suited as possible targets for the development of antivirals or vaccines. We also detected limited signals of positive selection in three viral ORFs. However, we warn that, in the absence of knowledge about the chain of events that determined the human spillover, these signals should not be necessarily interpreted as evidence of an adaptation to our species.
Collapse
Affiliation(s)
- Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, Milan, Italy
- Don C. Gnocchi Foundation ONLUS, IRCCS, Milan, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, Bosisio Parini, Italy
| |
Collapse
|
12
|
Mozzi A, Forni D, Cagliani R, Clerici M, Pozzoli U, Sironi M. Intrinsically disordered regions are abundant in simplexvirus proteomes and display signatures of positive selection. Virus Evol 2020; 6:veaa028. [PMID: 32411391 PMCID: PMC7211401 DOI: 10.1093/ve/veaa028] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Whereas the majority of herpesviruses co-speciated with their mammalian hosts, human herpes simplex virus 2 (HSV-2, genus Simplexvirus) most likely originated from the cross-species transmission of chimpanzee herpesvirus 1 to an ancestor of modern humans. We exploited the peculiar evolutionary history of HSV-2 to investigate the selective events that drove herpesvirus adaptation to a new host. We show that HSV-2 intrinsically disordered regions (IDRs)-that is, protein domains that do not adopt compact three-dimensional structures-are strongly enriched in positive selection signals. Analysis of viral proteomes indicated that a significantly higher portion of simplexvirus proteins is disordered compared with the proteins of other human herpesviruses. IDR abundance in simplexvirus proteomes was not a consequence of the base composition of their genomes (high G + C content). Conversely, protein function determines the IDR fraction, which is significantly higher in viral proteins that interact with human factors. We also found that the average extent of disorder in herpesvirus proteins tends to parallel that of their human interactors. These data suggest that viruses that interact with fast-evolving, disordered human proteins, in turn, evolve disordered viral interactors poised for innovation. We propose that the high IDR fraction present in simplexvirus proteomes contributes to their wider host range compared with other herpesviruses.
Collapse
Affiliation(s)
- Alessandra Mozzi
- Scientific Institute, IRCCS E. MEDEA, Bioinformatics, Bosisio Parini 23842, Italy
| | - Diego Forni
- Scientific Institute, IRCCS E. MEDEA, Bioinformatics, Bosisio Parini 23842, Italy
| | - Rachele Cagliani
- Scientific Institute, IRCCS E. MEDEA, Bioinformatics, Bosisio Parini 23842, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, Milan 20090, Italy.,Don C. Gnocchi Foundation ONLUS, IRCCS, Milan 20148, Italy
| | - Uberto Pozzoli
- Scientific Institute, IRCCS E. MEDEA, Bioinformatics, Bosisio Parini 23842, Italy
| | - Manuela Sironi
- Scientific Institute, IRCCS E. MEDEA, Bioinformatics, Bosisio Parini 23842, Italy
| |
Collapse
|
13
|
Past and ongoing adaptation of human cytomegalovirus to its host. PLoS Pathog 2020; 16:e1008476. [PMID: 32384127 PMCID: PMC7239485 DOI: 10.1371/journal.ppat.1008476] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 05/20/2020] [Accepted: 03/13/2020] [Indexed: 12/18/2022] Open
Abstract
Cytomegaloviruses (order Herpesvirales) display remarkable species-specificity as a result of long-term co-evolution with their mammalian hosts. Human cytomegalovirus (HCMV) is exquisitely adapted to our species and displays high genetic diversity. We leveraged information on inter-species divergence of primate-infecting cytomegaloviruses and intra-species diversity of clinical isolates to provide a genome-wide picture of HCMV adaptation across different time-frames. During adaptation to the human host, core viral genes were commonly targeted by positive selection. Functional characterization of adaptive mutations in the primase gene (UL70) indicated that selection favored amino acid replacements that decrease viral replication in human fibroblasts, suggesting evolution towards viral temperance. HCMV intra-species diversity was largely governed by immune system-driven selective pressure, with several adaptive variants located in antigenic domains. A significant excess of positively selected sites was also detected in the signal peptides (SPs) of viral proteins, indicating that, although they are removed from mature proteins, SPs can contribute to viral adaptation. Functional characterization of one of these SPs indicated that adaptive variants modulate the timing of cleavage by the signal peptidase and the dynamics of glycoprotein intracellular trafficking. We thus used evolutionary information to generate experimentally-testable hypotheses on the functional effect of HCMV genetic diversity and we define modulators of viral phenotypes. Human cytomegalovirus (HCMV), which represents the most common infectious cause of birth defects, is perfectly adapted to infect humans. We performed a two-tier analysis of HCMV evolution, by describing selective events that occurred during HCMV adaptation to our species and by identifying more recently emerged adaptive variants in clinical isolates. We show that distinct viral genes were targeted by natural selection over different time frames and we generate a catalog of adaptive variants that represent candidate determinants of viral phenotypic variation. As a proof of concept, we show that adaptive changes in the viral primase modulate viral growth in vitro and that selected variants in the UL144 signal peptide affect glycoprotein intracellular trafficking.
Collapse
|
14
|
Mugal CF, Kutschera VE, Botero-Castro F, Wolf JBW, Kaj I. Polymorphism Data Assist Estimation of the Nonsynonymous over Synonymous Fixation Rate Ratio ω for Closely Related Species. Mol Biol Evol 2020; 37:260-279. [PMID: 31504782 PMCID: PMC6984366 DOI: 10.1093/molbev/msz203] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The ratio of nonsynonymous over synonymous sequence divergence, dN/dS, is a widely used estimate of the nonsynonymous over synonymous fixation rate ratio ω, which measures the extent to which natural selection modulates protein sequence evolution. Its computation is based on a phylogenetic approach and computes sequence divergence of protein-coding DNA between species, traditionally using a single representative DNA sequence per species. This approach ignores the presence of polymorphisms and relies on the indirect assumption that new mutations fix instantaneously, an assumption which is generally violated and reasonable only for distantly related species. The violation of the underlying assumption leads to a time-dependence of sequence divergence, and biased estimates of ω in particular for closely related species, where the contribution of ancestral and lineage-specific polymorphisms to sequence divergence is substantial. We here use a time-dependent Poisson random field model to derive an analytical expression of dN/dS as a function of divergence time and sample size. We then extend our framework to the estimation of the proportion of adaptive protein evolution α. This mathematical treatment enables us to show that the joint usage of polymorphism and divergence data can assist the inference of selection for closely related species. Moreover, our analytical results provide the basis for a protocol for the estimation of ω and α for closely related species. We illustrate the performance of this protocol by studying a population data set of four corvid species, which involves the estimation of ω and α at different time-scales and for several choices of sample sizes.
Collapse
Affiliation(s)
- Carina F Mugal
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| | - Verena E Kutschera
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden.,Science for Life Laboratory, Stockholm University, Stockholm, Sweden.,Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Fidel Botero-Castro
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Jochen B W Wolf
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden.,Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Ingemar Kaj
- Department of Mathematics, Uppsala University, Uppsala, Sweden
| |
Collapse
|
15
|
Tataru P, Bataillon T. polyDFE: Inferring the Distribution of Fitness Effects and Properties of Beneficial Mutations from Polymorphism Data. Methods Mol Biol 2020; 2090:125-146. [PMID: 31975166 DOI: 10.1007/978-1-0716-0199-0_6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The possible evolutionary trajectories a population can follow is determined by the fitness effects of new mutations. Their relative frequencies are best specified through a distribution of fitness effects (DFE) that spans deleterious, neutral, and beneficial mutations. As such, the DFE is key to several aspects of the evolution of a population, and particularly the rate of adaptive molecular evolution (α). Inference of DFE from patterns of polymorphism and divergence has been a longstanding goal of evolutionary genetics.polyDFE provides a flexible statistical framework to estimate the DFE and α from site frequency spectrum (SFS) data. Several probability distributions can be fitted to the data to model the DFE. The method also jointly estimates a series of nuisance parameters that model the effect of unknown demography as well data imperfections, in particular possible errors in polarizing SNPs. This chapter is organized as a tutorial for polyDFE. We start by briefly reviewing the concept of DFE, α, and the principles underlying the method, and then provide an example using central chimpanzees data (Tataru et al., Genetics 207(3):1103-1119, 2017; Bataillon et al., Genome Biol Evol 7(4):1122-1132, 2015) to guide the user through the different steps of an analysis: formatting the data as input to polyDFE, fitting different models, obtaining estimates of parameters uncertainty and performing statistical tests, as well as model averaging procedures to obtain robust estimates of model parameters.
Collapse
Affiliation(s)
- Paula Tataru
- Bioinformatics Research Center, Aarhus University, Aarhus, Denmark
| | - Thomas Bataillon
- Bioinformatics Research Center, Aarhus University, Aarhus, Denmark.
| |
Collapse
|
16
|
Gupta MK, Vadde R. Genetic Basis of Adaptation and Maladaptation via Balancing Selection. ZOOLOGY 2019; 136:125693. [PMID: 31513936 DOI: 10.1016/j.zool.2019.125693] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 07/03/2019] [Indexed: 10/26/2022]
|
17
|
Abstract
Recombination rates vary within and between species. A gene that causes a difference in rate and pattern of crossing over between two species of Drosophila has been identified in a new study, and shown to evolve under natural selection.
Collapse
Affiliation(s)
- Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FL, UK.
| |
Collapse
|
18
|
Rey C, Lanore V, Veber P, Guéguen L, Lartillot N, Sémon M, Boussau B. Detecting adaptive convergent amino acid evolution. Philos Trans R Soc Lond B Biol Sci 2019; 374:20180234. [PMID: 31154974 PMCID: PMC6560273 DOI: 10.1098/rstb.2018.0234] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/25/2019] [Indexed: 11/12/2022] Open
Abstract
In evolutionary genomics, researchers have taken an interest in identifying substitutions that subtend convergent phenotypic adaptations. This is a difficult question that requires distinguishing foreground convergent substitutions that are involved in the convergent phenotype from background convergent substitutions. Those may be linked to other adaptations, may be neutral or may be the consequence of mutational biases. Furthermore, there is no generally accepted definition of convergent substitutions. Various methods that use different definitions have been proposed in the literature, resulting in different sets of candidate foreground convergent substitutions. In this article, we first describe the processes that can generate foreground convergent substitutions in coding sequences, separating adaptive from non-adaptive processes. Second, we review methods that have been proposed to detect foreground convergent substitutions in coding sequences and expose the assumptions that underlie them. Finally, we examine their power on simulations of convergent changes-including in the presence of a change in the efficacy of selection-and on empirical alignments. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.
Collapse
Affiliation(s)
- Carine Rey
- ENS de Lyon, CNRS UMR 5239, INSERM U1210, LBMC, Univ Lyon, Université Claude Bernard Lyon 1, F-69007 Lyon, France
| | - Vincent Lanore
- CNRS UMR 5558, LBBE, Univ Lyon, Université Claude Bernard Lyon 1, F-69100 Villeurbanne, France
| | - Philippe Veber
- CNRS UMR 5558, LBBE, Univ Lyon, Université Claude Bernard Lyon 1, F-69100 Villeurbanne, France
| | - Laurent Guéguen
- CNRS UMR 5558, LBBE, Univ Lyon, Université Claude Bernard Lyon 1, F-69100 Villeurbanne, France
| | - Nicolas Lartillot
- CNRS UMR 5558, LBBE, Univ Lyon, Université Claude Bernard Lyon 1, F-69100 Villeurbanne, France
| | - Marie Sémon
- ENS de Lyon, CNRS UMR 5239, INSERM U1210, LBMC, Univ Lyon, Université Claude Bernard Lyon 1, F-69007 Lyon, France
| | - Bastien Boussau
- CNRS UMR 5558, LBBE, Univ Lyon, Université Claude Bernard Lyon 1, F-69100 Villeurbanne, France
| |
Collapse
|
19
|
Tang Y, Li M, Sun J, Zhang T, Zhang J, Zheng P. TRCMGene: A two-step referential compression method for the efficient storage of genetic data. PLoS One 2018; 13:e0206521. [PMID: 30395579 PMCID: PMC6218042 DOI: 10.1371/journal.pone.0206521] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 10/08/2018] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The massive quantities of genetic data generated by high-throughput sequencing pose challenges to data storage, transmission and analyses. These problems are effectively solved through data compression, in which the size of data storage is reduced and the speed of data transmission is improved. Several options are available for compressing and storing genetic data. However, most of these options either do not provide sufficient compression rates or require a considerable length of time for decompression and loading. RESULTS Here, we propose TRCMGene, a lossless genetic data compression method that uses a referential compression scheme. The novel concept of two-step compression method, which builds an index structure using K-means and k-nearest neighbours, is introduced to TRCMGene. Evaluation with several real datasets revealed that the compression factor of TRCMGene ranges from 9 to 21. TRCMGene presents a good balance between compression factor and reading time. On average, the reading time of compressed data is 60% of that of uncompressed data. Thus, TRCMGene not only saves disc space but also saves file access time and speeds up data loading. These effects collectively improve genetic data storage and transmission in the current hardware environment and render system upgrades unnecessary. TRCMGene, user manual and demos could be accessed freely from https://github.com/tangyou79/TRCM. The data mentioned in this manuscript could be downloaded from: https://github.com/tangyou79/TRCM/wiki.
Collapse
Affiliation(s)
- You Tang
- Electrical and Information Engineering College, JiLin Agricultural Science and Technology University, Jilin, China
| | - Min Li
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
| | - Jing Sun
- College of Life Science and Agriculture, Qiqihar University, Qiqihar, China
| | - Tao Zhang
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
| | - Jicheng Zhang
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
- * E-mail: (JCZ); (PZ)
| | - Ping Zheng
- College of Electrical and Information, Northeast Agricultural University, Harbin, China
- * E-mail: (JCZ); (PZ)
| |
Collapse
|
20
|
Sironi M, Forni D, Clerici M, Cagliani R. Genetic conflicts with Plasmodium parasites and functional constraints shape the evolution of erythrocyte cytoskeletal proteins. Sci Rep 2018; 8:14682. [PMID: 30279439 PMCID: PMC6168477 DOI: 10.1038/s41598-018-33049-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 09/19/2018] [Indexed: 11/19/2022] Open
Abstract
Plasmodium parasites exerted a strong selective pressure on primate genomes and mutations in genes encoding erythrocyte cytoskeleton proteins (ECP) determine protective effects against Plasmodium infection/pathogenesis. We thus hypothesized that ECP-encoding genes have evolved in response to Plasmodium-driven selection. We analyzed the evolutionary history of 15 ECP-encoding genes in primates, as well as of their Plasmodium-encoded ligands (KAHRP, MESA and EMP3). Results indicated that EPB42, SLC4A1, and SPTA1 evolved under pervasive positive selection and that episodes of positive selection tended to occur more frequently in primate species that host a larger number of Plasmodium parasites. Conversely, several genes, including ANK1 and SPTB, displayed extensive signatures of purifying selection in primate phylogenies, Homininae lineages, and human populations, suggesting strong functional constraints. Analysis of Plasmodium genes indicated adaptive evolution in MESA and KAHRP; in the latter, different positively selected sites were located in the spectrin-binding domains. Because most of the positively selected sites in alpha-spectrin localized to the domains involved in the interaction with KAHRP, we suggest that the two proteins are engaged in an arms-race scenario. This observation is relevant because KAHRP is essential for the formation of “knobs”, which represent a major virulence determinant for P. falciparum.
Collapse
Affiliation(s)
- Manuela Sironi
- Bioinformatics, Scientific Institute, IRCCS E. Medea, 23842, Bosisio Parini, Lecco, Italy
| | - Diego Forni
- Bioinformatics, Scientific Institute, IRCCS E. Medea, 23842, Bosisio Parini, Lecco, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, 20090, Milan, Italy.,Don C. Gnocchi Foundation ONLUS, IRCCS, 20148, Milan, Italy
| | - Rachele Cagliani
- Bioinformatics, Scientific Institute, IRCCS E. Medea, 23842, Bosisio Parini, Lecco, Italy.
| |
Collapse
|
21
|
Savisaar R, Hurst LD. Exonic splice regulation imposes strong selection at synonymous sites. Genome Res 2018; 28:1442-1454. [PMID: 30143596 PMCID: PMC6169883 DOI: 10.1101/gr.233999.117] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Accepted: 07/31/2018] [Indexed: 01/17/2023]
Abstract
What proportion of coding sequence nucleotides have roles in splicing, and how strong is the selection that maintains them? Despite a large body of research into exonic splice regulatory signals, these questions have not been answered. This is because, to our knowledge, previous investigations have not explicitly disentangled the frequency of splice regulatory elements from the strength of the evolutionary constraint under which they evolve. Current data are consistent both with a scenario of weak and diffuse constraint, enveloping large swaths of sequence, as well as with well-defined pockets of strong purifying selection. In the former case, natural selection on exonic splice enhancers (ESEs) might primarily act as a slight modifier of codon usage bias. In the latter, mutations that disrupt ESEs are likely to have large fitness and, potentially, clinical effects. To distinguish between these scenarios, we used several different methods to determine the distribution of selection coefficients for new mutations within ESEs. The analyses converged to suggest that ∼15%-20% of fourfold degenerate sites are part of functional ESEs. Most of these sites are under strong evolutionary constraint. Therefore, exonic splice regulation does not simply impose a weak bias that gently nudges coding sequence evolution in a particular direction. Rather, the selection to preserve these motifs is a strong force that severely constrains the evolution of a substantial proportion of coding nucleotides. Thus synonymous mutations that disrupt ESEs should be considered as a potentially common cause of single-locus genetic disorders.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, United Kingdom
| |
Collapse
|
22
|
Multiple Selected Changes May Modulate the Molecular Interaction between Laverania RH5 and Primate Basigin. mBio 2018; 9:mBio.00476-18. [PMID: 29789367 PMCID: PMC5964352 DOI: 10.1128/mbio.00476-18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
|
23
|
Distinguishing Among Evolutionary Forces Acting on Genome-Wide Base Composition: Computer Simulation Analysis of Approximate Methods for Inferring Site Frequency Spectra of Derived Mutations. G3-GENES GENOMES GENETICS 2018; 8:1755-1769. [PMID: 29588382 PMCID: PMC5940166 DOI: 10.1534/g3.117.300512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Inferred ancestral nucleotide states are increasingly employed in analyses of within- and between -species genome variation. Although numerous studies have focused on ancestral inference among distantly related lineages, approaches to infer ancestral states in polymorphism data have received less attention. Recently developed approaches that employ complex transition matrices allow us to infer ancestral nucleotide sequence in various evolutionary scenarios of base composition. However, the requirement of a single gene tree to calculate a likelihood is an important limitation for conducting ancestral inference using within-species variation in recombining genomes. To resolve this problem, and to extend the applicability of ancestral inference in studies of base composition evolution, we first evaluate three previously proposed methods to infer ancestral nucleotide sequences among within- and between-species sequence variation data. The methods employ a single allele, bifurcating tree, or a star tree for within-species variation data. Using simulated nucleotide sequences, we employ ancestral inference to infer fixations and polymorphisms. We find that all three methods show biased inference. We modify the bifurcating tree method to include weights to adjust for an expected site frequency spectrum, “bifurcating tree with weighting” (BTW). Our simulation analysis show that the BTW method can substantially improve the reliability and robustness of ancestral inference in a range of scenarios that include non-neutral and/or non-stationary base composition evolution.
Collapse
|
24
|
Brand CL, Cattani MV, Kingan SB, Landeen EL, Presgraves DC. Molecular Evolution at a Meiosis Gene Mediates Species Differences in the Rate and Patterning of Recombination. Curr Biol 2018; 28:1289-1295.e4. [PMID: 29606420 DOI: 10.1016/j.cub.2018.02.056] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Revised: 02/15/2018] [Accepted: 02/20/2018] [Indexed: 10/17/2022]
Abstract
Crossing over between homologous chromosomes during meiosis repairs programmed DNA double-strand breaks, ensures proper segregation at meiosis I [1], shapes the genomic distribution of nucleotide variability in populations, and enhances the efficacy of natural selection among genetically linked sites [2]. Between closely related Drosophila species, large differences exist in the rate and chromosomal distribution of crossing over. Little, however, is known about the molecular genetic changes or population genetic forces that mediate evolved differences in recombination between species [3, 4]. Here, we show that a meiosis gene with a history of rapid evolution acts as a trans-acting modifier of species differences in crossing over. In transgenic flies, the dicistronic gene, mei-217/mei-218, recapitulates a large part of the species differences in the rate and chromosomal distribution of crossing over. These phenotypic differences appear to result from changes in protein sequence not gene expression. Our population genetics analyses show that the protein-coding sequence of mei-218, but not mei-217, has a history of recurrent positive natural selection. By modulating the intensity of centromeric and telomeric suppression of crossing over, evolution at mei-217/-218 has incidentally shaped gross differences in the chromosomal distribution of nucleotide variability between species. We speculate that recurrent bouts of adaptive evolution at mei-217/-218 might reflect a history of coevolution with selfish genetic elements.
Collapse
Affiliation(s)
- Cara L Brand
- Department of Biology, University of Rochester, Rochester, NY 14627, USA
| | - M Victoria Cattani
- Department of Biology, University of Rochester, Rochester, NY 14627, USA
| | - Sarah B Kingan
- Department of Biology, University of Rochester, Rochester, NY 14627, USA
| | - Emily L Landeen
- Department of Biology, University of Rochester, Rochester, NY 14627, USA
| | - Daven C Presgraves
- Department of Biology, University of Rochester, Rochester, NY 14627, USA.
| |
Collapse
|
25
|
Young BC, Wu CH, Gordon NC, Cole K, Price JR, Liu E, Sheppard AE, Perera S, Charlesworth J, Golubchik T, Iqbal Z, Bowden R, Massey RC, Paul J, Crook DW, Peto TE, Walker AS, Llewelyn MJ, Wyllie DH, Wilson DJ. Severe infections emerge from commensal bacteria by adaptive evolution. eLife 2017; 6. [PMID: 29256859 PMCID: PMC5736351 DOI: 10.7554/elife.30637] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2017] [Accepted: 12/02/2017] [Indexed: 12/23/2022] Open
Abstract
Bacteria responsible for the greatest global mortality colonize the human microbiota far more frequently than they cause severe infections. Whether mutation and selection among commensal bacteria are associated with infection is unknown. We investigated de novo mutation in 1163 Staphylococcus aureus genomes from 105 infected patients with nose colonization. We report that 72% of infections emerged from the nose, with infecting and nose-colonizing bacteria showing parallel adaptive differences. We found 2.8-to-3.6-fold adaptive enrichments of protein-altering variants in genes responding to rsp, which regulates surface antigens and toxin production; agr, which regulates quorum-sensing, toxin production and abscess formation; and host-derived antimicrobial peptides. Adaptive mutations in pathogenesis-associated genes were 3.1-fold enriched in infecting but not nose-colonizing bacteria. None of these signatures were observed in healthy carriers nor at the species-level, suggesting infection-associated, short-term, within-host selection pressures. Our results show that signatures of spontaneous adaptive evolution are specifically associated with infection, raising new possibilities for diagnosis and treatment.
Collapse
Affiliation(s)
- Bernadette C Young
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom.,Microbiology and Infectious Diseases Department, Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom
| | - Chieh-Hsi Wu
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom
| | - N Claire Gordon
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom
| | - Kevin Cole
- Department of Infectious Diseases and Microbiology, Royal Sussex County Hospital, Brighton, United Kingdom
| | - James R Price
- Department of Infectious Diseases and Microbiology, Royal Sussex County Hospital, Brighton, United Kingdom.,Department of Global Health and Infection, Brighton and Sussex Medical School, University of Sussex, Brighton, United Kingdom
| | - Elian Liu
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom.,Microbiology and Infectious Diseases Department, Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom
| | - Anna E Sheppard
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom.,NIHR Health Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, United Kingdom
| | - Sanuki Perera
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom.,Microbiology and Infectious Diseases Department, Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom
| | - Jane Charlesworth
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom
| | - Tanya Golubchik
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom
| | - Zamin Iqbal
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Rory Bowden
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Ruth C Massey
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
| | - John Paul
- National Infection Service, Public Health England, London, United Kingdom.,National Institute for Health Research, Oxford Biomedical Research Centre, Oxford, United Kingdom
| | - Derrick W Crook
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom.,National Infection Service, Public Health England, London, United Kingdom.,National Institute for Health Research, Oxford Biomedical Research Centre, Oxford, United Kingdom
| | - Timothy E Peto
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom.,National Institute for Health Research, Oxford Biomedical Research Centre, Oxford, United Kingdom
| | - A Sarah Walker
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom.,National Institute for Health Research, Oxford Biomedical Research Centre, Oxford, United Kingdom
| | - Martin J Llewelyn
- Department of Infectious Diseases and Microbiology, Royal Sussex County Hospital, Brighton, United Kingdom.,Department of Global Health and Infection, Brighton and Sussex Medical School, University of Sussex, Brighton, United Kingdom
| | - David H Wyllie
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom.,Centre for Molecular and Cellular Physiology, Jenner Institute, Oxford, United Kingdom
| | - Daniel J Wilson
- Nuffield Department of Medicine, Experimental Medicine Division, University of Oxford, Oxford, United Kingdom.,Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.,Institute for Emerging Infections, Oxford Martin School, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
26
|
Rizzato F, Rodriguez A, Biarnés X, Laio A. Predicting Amino Acid Substitution Probabilities Using Single Nucleotide Polymorphisms. Genetics 2017; 207:643-652. [PMID: 28754661 PMCID: PMC5629329 DOI: 10.1534/genetics.117.300078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Accepted: 07/18/2017] [Indexed: 11/18/2022] Open
Abstract
Fast genome sequencing offers invaluable opportunities for building updated and improved models of protein sequence evolution. We here show that Single Nucleotide Polymorphisms (SNPs) can be used to build a model capable of predicting the probability of substitution between amino acids in variants of the same protein in different species. The model is based on a substitution matrix inferred from the frequency of codon interchanges observed in a suitably selected subset of human SNPs, and predicts the substitution probabilities observed in alignments between Homo sapiens and related species at 85-100% of sequence identity better than any other approach we are aware of. The model gradually loses its predictive power at lower sequence identity. Our results suggest that SNPs can be employed, together with multiple sequence alignment data, to model protein sequence evolution. The SNP-based substitution matrix developed in this work can be exploited to better align protein sequences of related organisms, to refine the estimate of the evolutionary distance between protein variants from related species in phylogenetic trees and, in perspective, might become a useful tool for population analysis.
Collapse
Affiliation(s)
- Francesca Rizzato
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), 34136 Trieste, Italy
| | - Alex Rodriguez
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), 34136 Trieste, Italy
| | - Xevi Biarnés
- Laboratory of Biochemistry, Institut Químic de Sarrià (IQS), Universitat Ramon Llull (URL), 08017 Barcelona, Spain
| | - Alessandro Laio
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), 34136 Trieste, Italy
- The Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
| |
Collapse
|
27
|
Inference of Distribution of Fitness Effects and Proportion of Adaptive Substitutions from Polymorphism Data. Genetics 2017; 207:1103-1119. [PMID: 28951530 PMCID: PMC5676230 DOI: 10.1534/genetics.117.300323] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 09/13/2017] [Indexed: 11/18/2022] Open
Abstract
The distribution of fitness effects (DFE) encompasses the fraction of deleterious, neutral, and beneficial mutations. It conditions the evolutionary trajectory of populations, as well as the rate of adaptive molecular evolution (α). Inferring DFE and α from patterns of polymorphism, as given through the site frequency spectrum (SFS) and divergence data, has been a longstanding goal of evolutionary genetics. A widespread assumption shared by previous inference methods is that beneficial mutations only contribute negligibly to the polymorphism data. Hence, a DFE comprising only deleterious mutations tends to be estimated from SFS data, and α is then predicted by contrasting the SFS with divergence data from an outgroup. We develop a hierarchical probabilistic framework that extends previous methods to infer DFE and α from polymorphism data alone. We use extensive simulations to examine the performance of our method. While an outgroup is still needed to obtain an unfolded SFS, we show that both a DFE, comprising both deleterious and beneficial mutations, and α can be inferred without using divergence data. We also show that not accounting for the contribution of beneficial mutations to polymorphism data leads to substantially biased estimates of the DFE and α. We compare our framework with one of the most widely used inference methods available and apply it on a recently published chimpanzee exome data set.
Collapse
|
28
|
Pontremoli C, Forni D, Cagliani R, Pozzoli U, Riva S, Bravo IG, Clerici M, Sironi M. Evolutionary analysis of Old World arenaviruses reveals a major adaptive contribution of the viral polymerase. Mol Ecol 2017; 26:5173-5188. [PMID: 28779541 DOI: 10.1111/mec.14282] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 07/25/2017] [Accepted: 07/31/2017] [Indexed: 12/17/2022]
Abstract
The Old World (OW) arenavirus complex includes several species of rodent-borne viruses, some of which (i.e., Lassa virus, LASV and Lymphocytic choriomeningitis virus, LCMV) cause human diseases. Most LCMV and LASV infections are caused by rodent-to-human transmissions. Thus, viral evolution is largely determined by events that occur in the wildlife reservoirs. We used a set of human- and rodent-derived viral sequences to investigate the evolutionary history underlying OW arenavirus speciation, as well as the more recent selective events that accompanied LASV spread in West Africa. We show that the viral RNA polymerase (L protein) was a major positive selection target in OW arenaviruses and during LASV out-of-Nigeria migration. No evidence of selection was observed for the glycoprotein, whereas positive selection acted on the nucleoprotein (NP) during LCMV speciation. Positively selected sites in L and NP are surrounded by highly conserved residues, and the bulk of the viral genome evolves under purifying selection. Several positively selected sites are likely to modulate viral replication/transcription. In both L and NP, structural features (solvent exposed surface area) are important determinants of site-wise evolutionary rate variation. By incorporating several rodent-derived sequences, we also performed an analysis of OW arenavirus codon adaptation to the human host. Results do not support a previously hypothesized role of codon adaptation in disease severity for non-Nigerian strains. In conclusion, L and NP represent the major selection targets and possible determinants of disease presentation; these results suggest that field surveys and experimental studies should primarily focus on these proteins.
Collapse
Affiliation(s)
- Chiara Pontremoli
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, Bosisio Parini, Italy
| | - Diego Forni
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, Bosisio Parini, Italy
| | - Rachele Cagliani
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, Bosisio Parini, Italy
| | - Uberto Pozzoli
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, Bosisio Parini, Italy
| | - Stefania Riva
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, Bosisio Parini, Italy
| | - Ignacio G Bravo
- Laboratory MIVEGEC, UMR CNRS 5290, IRD 224, UM, Centre National de la Recherche Scientifique, Montpellier, France
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, Milan, Italy.,Don C. Gnocchi Foundation ONLUS, IRCCS, Milan, Italy
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, Bosisio Parini, Italy
| |
Collapse
|
29
|
Mozzi A, Guerini FR, Forni D, Costa AS, Nemni R, Baglio F, Cabinio M, Riva S, Pontremoli C, Clerici M, Sironi M, Cagliani R. REST, a master regulator of neurogenesis, evolved under strong positive selection in humans and in non human primates. Sci Rep 2017; 7:9530. [PMID: 28842657 PMCID: PMC5573535 DOI: 10.1038/s41598-017-10245-w] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 08/07/2017] [Indexed: 12/03/2022] Open
Abstract
The transcriptional repressor REST regulates many neuronal genes by binding RE1 motifs. About one third of human RE1s are recently evolved and specific to primates. As changes in the activity of a transcription factor reverberate on its downstream targets, we assessed whether REST displays fast evolutionary rates in primates. We show that REST was targeted by very strong positive selection during primate evolution. Positive selection was also evident in the human lineage, with six selected sites located in a region that surrounds a VNTR in exon 4. Analysis of expression data indicated that REST brain expression peaks during aging in humans but not in other primates. Because a REST coding variant (rs3796529) was previously associated with protection from hippocampal atrophy in elderly subjects with mild cognitive impairment (MCI), we analyzed a cohort of Alzheimer disease (AD) continuum patients. Genotyping of two coding variants (rs3796529 and rs2227902) located in the region surrounding the VNTR indicated a role for rs2227902 in modulation of hippocampal volume loss, indirectly confirming a role for REST in neuroprotection. Experimental studies will be instrumental to determine the functional effect of positively selected sites in REST and the role of REST variants in neuropreservation/neurodegeneration.
Collapse
Affiliation(s)
- Alessandra Mozzi
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | | | - Diego Forni
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | | | | | | | - Monia Cabinio
- Don C. Gnocchi Foundation ONLUS, IRCCS, 20148, Milan, Italy
| | - Stefania Riva
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Chiara Pontremoli
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Mario Clerici
- Don C. Gnocchi Foundation ONLUS, IRCCS, 20148, Milan, Italy.,Department of Physiopathology and Transplantation, University of Milan, 20090, Milan, Italy
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Rachele Cagliani
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy.
| |
Collapse
|
30
|
Mozzi A, Forni D, Cagliani R, Pozzoli U, Clerici M, Sironi M. Distinct selective forces and Neanderthal introgression shaped genetic diversity at genes involved in neurodevelopmental disorders. Sci Rep 2017; 7:6116. [PMID: 28733602 PMCID: PMC5522412 DOI: 10.1038/s41598-017-06440-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 06/13/2017] [Indexed: 01/11/2023] Open
Abstract
In addition to high intelligence, humans evolved specialized social-cognitive skills, which are specifically affected in children with autism spectrum disorder (ASD). Genes affected in ASD represent suitable candidates to study the evolution of human social cognition. We performed an evolutionary analysis on 68 genes associated to neurodevelopmental disorders; our data indicate that genetic diversity was shaped by distinct selective forces, including natural selection and introgression from archaic hominins. We discuss the possibility that segregation distortion during spermatogenesis accounts for a subset of ASD mutations. Finally, we detected modern-human-specific alleles in DYRK1A and TCF4. These variants are located within regions that display chromatin features typical of transcriptional enhancers in several brain areas, strongly suggesting a regulatory role. These SNPs thus represent candidates for association with neurodevelopmental disorders, and await experimental validation in future studies.
Collapse
Affiliation(s)
- Alessandra Mozzi
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy.
| | - Diego Forni
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Rachele Cagliani
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Uberto Pozzoli
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, 20090, Milan, Italy.,Don C. Gnocchi Foundation ONLUS, IRCCS, 20100, Milan, Italy
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| |
Collapse
|
31
|
Savisaar R, Hurst LD. Estimating the prevalence of functional exonic splice regulatory information. Hum Genet 2017; 136:1059-1078. [PMID: 28405812 PMCID: PMC5602102 DOI: 10.1007/s00439-017-1798-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022]
Abstract
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1–4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
32
|
Al-Daghri NM, Pontremoli C, Cagliani R, Forni D, Alokail MS, Al-Attas OS, Sabico S, Riva S, Clerici M, Sironi M. Susceptibility to type 2 diabetes may be modulated by haplotypes in G6PC2, a target of positive selection. BMC Evol Biol 2017; 17:43. [PMID: 28173748 PMCID: PMC5297017 DOI: 10.1186/s12862-017-0897-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2016] [Accepted: 01/26/2017] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The endoplasmic reticulum enzyme glucose-6-phosphatase catalyzes the common terminal reaction in the gluconeogenic/glycogenolytic pathways and plays a central role in glucose homeostasis. In most mammals, different G6PC subunits are encoded by three paralogous genes (G6PC, G6PC2, and G6PC3). Mutations in G6PC and G6PC3 are responsible for human mendelian diseases, whereas variants in G6PC2 are associated with fasting glucose (FG) levels. RESULTS We analyzed the evolutionary history of G6Pase genes. Results indicated that the three paralogs originated during early vertebrate evolution and that negative selection was the major force shaping diversity at these genes in mammals. Nonetheless, site-wise estimation of evolutionary rates at corresponding sites revealed weak correlations, suggesting that mammalian G6Pases have evolved different structural features over time. We also detected pervasive positive selection at mammalian G6PC2. Most selected residues localize in the C-terminal protein region, where several human variants associated with FG levels also map. This region was re-sequenced in ~560 subjects from Saudi Arabia, 185 of whom suffering from type 2 diabetes (T2D). The frequency of rare missense and nonsense variants was not significantly different in T2D and controls. Association analysis with two common missense variants (V219L and S342C) revealed a weak but significant association for both SNPs when analyses were conditioned on rs560887, previously identified in a GWAS for FG. Two haplotypes were significantly associated with T2D with an opposite effect direction. CONCLUSIONS We detected pervasive positive selection at mammalian G6PC2 genes and we suggest that distinct haplotypes at the G6PC2 locus modulate susceptibility to T2D.
Collapse
Affiliation(s)
- Nasser M Al-Daghri
- Biomarker research program, Biochemistry Department, College of Science, King Saud Universiy, Riyadh, 11451, Kingdom of Saudi Arabia.,Prince Mutaib Chair for Biomarkers of Osteoporosis Research, King Saud University, Riyadh, 11451, Kingdom of Saudi Arabia
| | | | - Rachele Cagliani
- Scientific Institute IRCCS E.MEDEA, Bosisio Parini, 23842, Italy
| | - Diego Forni
- Scientific Institute IRCCS E.MEDEA, Bosisio Parini, 23842, Italy
| | - Majed S Alokail
- Biomarker research program, Biochemistry Department, College of Science, King Saud Universiy, Riyadh, 11451, Kingdom of Saudi Arabia.,Prince Mutaib Chair for Biomarkers of Osteoporosis Research, King Saud University, Riyadh, 11451, Kingdom of Saudi Arabia
| | - Omar S Al-Attas
- Biomarker research program, Biochemistry Department, College of Science, King Saud Universiy, Riyadh, 11451, Kingdom of Saudi Arabia.,Prince Mutaib Chair for Biomarkers of Osteoporosis Research, King Saud University, Riyadh, 11451, Kingdom of Saudi Arabia
| | - Shaun Sabico
- Biomarker research program, Biochemistry Department, College of Science, King Saud Universiy, Riyadh, 11451, Kingdom of Saudi Arabia.,Prince Mutaib Chair for Biomarkers of Osteoporosis Research, King Saud University, Riyadh, 11451, Kingdom of Saudi Arabia
| | - Stefania Riva
- Scientific Institute IRCCS E.MEDEA, Bosisio Parini, 23842, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, via F.lli Cervi 93, Segrate, 20090, Milan, Italy. .,Don Gnocchi Foundation, ONLUS, Milan, 20148, Italy.
| | - Manuela Sironi
- Scientific Institute IRCCS E.MEDEA, Bosisio Parini, 23842, Italy
| |
Collapse
|
33
|
Fijarczyk A, Dudek K, Babik W. Selective Landscapes in newt Immune Genes Inferred from Patterns of Nucleotide Variation. Genome Biol Evol 2016; 8:3417-3432. [PMID: 27702815 PMCID: PMC5203778 DOI: 10.1093/gbe/evw236] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Host–pathogen interactions may result in either directional selection or in pressure for the maintenance of polymorphism at the molecular level. Hence signatures of both positive and balancing selection are expected in immune genes. Because both overall selective pressure and specific targets may differ between species, large-scale population genomic studies are useful in detecting functionally important immune genes and comparing selective landscapes between taxa. Such studies are of particular interest in amphibians, a group threatened worldwide by emerging infectious diseases. Here, we present an analysis of polymorphism and divergence of 634 immune genes in two lineages of Lissotriton newts: L. montandoni and L. vulgaris graecus. Variation in newt immune genes has been shaped predominantly by widespread purifying selection and strong evolutionary constraint, implying long-term importance of these genes for functioning of the immune system. The two evolutionary lineages differ in the overall strength of purifying selection which can partially be explained by demographic history but may also signal differences in long-term pathogen pressure. The prevalent constraint notwithstanding, 23 putative targets of positive selection and 11 putative targets of balancing selection were identified. The latter were detected by composite tests involving the demographic model and further validated in independent population samples. Putative targets of balancing selection encode proteins which may interact closely with pathogens but include also regulators of immune response. The identified candidates will be useful for testing whether genes affected by balancing selection are more prone to interspecific introgression than other genes in the genome.
Collapse
Affiliation(s)
- Anna Fijarczyk
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| | - Katarzyna Dudek
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| | - Wieslaw Babik
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| |
Collapse
|
34
|
Pontremoli C, Forni D, Cagliani R, Filippi G, De Gioia L, Pozzoli U, Clerici M, Sironi M. Positive Selection Drives Evolution at the Host-Filovirus Interaction Surface. Mol Biol Evol 2016; 33:2836-2847. [PMID: 27512112 DOI: 10.1093/molbev/msw158] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Filovirus infection is mediated by engagement of the surface-exposed glycoprotein (GP) by its cellular receptor, NPC1 (Niemann-Pick C1). Two loops in the C domain of NPC1 (NPC1-C) bind filovirus GP. Herein, we show that filovirus GP and NPC1-C evolve under mutual selective pressure. Analysis of a large mammalian phylogeny indicated that strong functional/structural constraints limit the NPC1 sequence space available for adaptive change and most sites at the contact interface with GP are under negative selection. These constraints notwithstanding, we detected positive selection at NPC1-C in all mammalian orders, from Primates to Xenarthra. Different codons evolved adaptively in distinct mammals, and most selected sites are located within the two NPC1-C loops that engage GP, or at their anchor points. In Homininae, NPC1-C was a preferential selection target, and the T419I variant possibly represents a human-specific adaptation to filovirus infection. On the other side of the arms-race, GP evolved adaptively during filovirus speciation. One of the selected sites (S142Q) establishes several atom-to-atom contacts with NPC1-C. Additional selected sites are located within epitopes recognized by neutralizing antibodies, including the 14G7 epitope, where sites selected during the recent EBOV epidemic also map. Finally, pairs of co-evolving sites in Marburgviruses and Ebolaviruses were found to involve antigenic determinants. These findings suggest that the host humoral immune response was a major selective pressure during filovirus speciation. The S142Q variant may contribute to determine Ebolavirus host range in the wild. If this were the case, EBOV/BDBV (S142) and SUDV (Q142) may not share the same reservoir(s).
Collapse
Affiliation(s)
- Chiara Pontremoli
- Scientific Institute IRCCS E.MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Diego Forni
- Scientific Institute IRCCS E.MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E.MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Giulia Filippi
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan, Italy
| | - Luca De Gioia
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, Milan, Italy
| | - Uberto Pozzoli
- Scientific Institute IRCCS E.MEDEA, Bioinformatics, Bosisio Parini, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, Milan, Italy Don C. Gnocchi Foundation ONLUS, IRCCS, Milan, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E.MEDEA, Bioinformatics, Bosisio Parini, Italy
| |
Collapse
|
35
|
Elyashiv E, Sattath S, Hu TT, Strutsovsky A, McVicker G, Andolfatto P, Coop G, Sella G. A Genomic Map of the Effects of Linked Selection in Drosophila. PLoS Genet 2016; 12:e1006130. [PMID: 27536991 PMCID: PMC4990265 DOI: 10.1371/journal.pgen.1006130] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 05/26/2016] [Indexed: 01/23/2023] Open
Abstract
Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of "linked selection" on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of other modes of linked selection and of adaptation in particular. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated.
Collapse
Affiliation(s)
- Eyal Elyashiv
- Department of Ecology, Evolution, and Behavior, Hebrew University of Jerusalem, Jerusalem, Israel
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| | - Shmuel Sattath
- Department of Ecology, Evolution, and Behavior, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Tina T. Hu
- Department of Ecology and Evolutionary Biology and the Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Alon Strutsovsky
- Department of Ecology, Evolution, and Behavior, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Graham McVicker
- The Laboratory of Genetics and The Integrative Biology Laboratory, Salk Institute for Biological Studies, La Jolla, California, United States of America
| | - Peter Andolfatto
- Department of Ecology and Evolutionary Biology and the Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Graham Coop
- Department of Evolution and Ecology, University of California, Davis, Davis, California, United States of America
| | - Guy Sella
- Department of Biological Sciences, Columbia University, New York, New York, United States of America
| |
Collapse
|
36
|
Hemmer LW, Blumenstiel JP. Holding it together: rapid evolution and positive selection in the synaptonemal complex of Drosophila. BMC Evol Biol 2016; 16:91. [PMID: 27150275 PMCID: PMC4857336 DOI: 10.1186/s12862-016-0670-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 04/27/2016] [Indexed: 11/21/2022] Open
Abstract
Background The synaptonemal complex (SC) is a highly conserved meiotic structure that functions to pair homologs and facilitate meiotic recombination in most eukaryotes. Five Drosophila SC proteins have been identified and localized within the complex: C(3)G, C(2)M, CONA, ORD, and the newly identified Corolla. The SC is required for meiotic recombination in Drosophila and absence of these proteins leads to reduced crossing over and chromosomal nondisjunction. Despite the conserved nature of the SC and the key role that these five proteins have in meiosis in D. melanogaster, they display little apparent sequence conservation outside the genus. To identify factors that explain this lack of apparent conservation, we performed a molecular evolutionary analysis of these genes across the Drosophila genus. Results For the five SC components, gene sequence similarity declines rapidly with increasing phylogenetic distance and only ORD and C(2)M are identifiable outside of the Drosophila genus. SC gene sequences have a higher dN/dS (ω) rate ratio than the genome wide average and this can in part be explained by the action of positive selection in almost every SC component. Across the genus, there is significant variation in ω for each protein. It further appears that ω estimates for the five SC components are in accordance with their physical position within the SC. Components interacting with chromatin evolve slowest and components comprising the central elements evolve the most rapidly. Finally, using population genetic approaches, we demonstrate that positive selection on SC components is ongoing. Conclusions SC components within Drosophila show little apparent sequence homology to those identified in other model organisms due to their rapid evolution. We propose that the Drosophila SC is evolving rapidly due to two combined effects. First, we propose that a high rate of evolution can be partly explained by low purifying selection on protein components whose function is to simply hold chromosomes together. We also propose that positive selection in the SC is driven by its sex-specificity combined with its role in facilitating both recombination and centromere clustering in the face of recurrent bouts of drive in female meiosis. Electronic supplementary material The online version of this article (doi:10.1186/s12862-016-0670-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lucas W Hemmer
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, 66045, USA.
| | - Justin P Blumenstiel
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, 66045, USA
| |
Collapse
|
37
|
Cagliani R, Forni D, Filippi G, Mozzi A, De Gioia L, Pontremoli C, Pozzoli U, Bresolin N, Clerici M, Sironi M. The mammalian complement system as an epitome of host-pathogen genetic conflicts. Mol Ecol 2016; 25:1324-39. [PMID: 26836579 DOI: 10.1111/mec.13558] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 12/29/2015] [Accepted: 01/27/2016] [Indexed: 12/11/2022]
Abstract
The complement system is an innate immunity effector mechanism; its action is antagonized by a wide array of pathogens and complement evasion determines the virulence of several infections. We investigated the evolutionary history of the complement system and of bacterial-encoded complement-interacting proteins. Complement components targeted by several pathogens evolved under strong selective pressure in primates, with selection acting on residues at the contact interface with microbial/viral proteins. Positively selected sites in CFH and C4BPA account for the human specificity of gonococcal infection. Bacterial interactors, evolved adaptively as well, with selected sites located at interaction surfaces with primate complement proteins. These results epitomize the expectation under a genetic conflict scenario whereby the host's and the pathogen's genes evolve within binding avoidance-binding seeking dynamics. In silico mutagenesis and protein-protein docking analyses supported this by showing that positively selected sites, both in the host's and in the pathogen's interacting partner, modulate binding.
Collapse
Affiliation(s)
- Rachele Cagliani
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Diego Forni
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Giulia Filippi
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, 20126, Milan, Italy
| | - Alessandra Mozzi
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Luca De Gioia
- Department of Biotechnology and Biosciences, University of Milan-Bicocca, 20126, Milan, Italy
| | - Chiara Pontremoli
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Uberto Pozzoli
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| | - Nereo Bresolin
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy.,Dino Ferrari Centre, Department of Physiopathology and Transplantation, University of Milan, Fondazione Ca' Granda IRCCS Ospedale Maggiore Policlinico, 20122, Milan, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, 20090, Milan, Italy.,Don C. Gnocchi Foundation ONLUS, IRCCS, 20148, Milan, Italy
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842, Bosisio Parini, Italy
| |
Collapse
|
38
|
Mozzi A, Forni D, Clerici M, Pozzoli U, Mascheretti S, Guerini FR, Riva S, Bresolin N, Cagliani R, Sironi M. The evolutionary history of genes involved in spoken and written language: beyond FOXP2. Sci Rep 2016; 6:22157. [PMID: 26912479 PMCID: PMC4766443 DOI: 10.1038/srep22157] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 02/08/2016] [Indexed: 12/14/2022] Open
Abstract
Humans possess a communication system based on spoken and written language. Other animals can learn vocalization by imitation, but this is not equivalent to human language. Many genes were described to be implicated in language impairment (LI) and developmental dyslexia (DD), but their evolutionary history has not been thoroughly analyzed. Herein we analyzed the evolution of ten genes involved in DD and LI. Results show that the evolutionary history of LI genes for mammals and aves was comparable in vocal-learner species and non-learners. For the human lineage, several sites showing evidence of positive selection were identified in KIAA0319 and were already present in Neanderthals and Denisovans, suggesting that any phenotypic change they entailed was shared with archaic hominins. Conversely, in FOXP2, ROBO1, ROBO2, and CNTNAP2 non-coding changes rose to high frequency after the separation from archaic hominins. These variants are promising candidates for association studies in LI and DD.
Collapse
Affiliation(s)
- Alessandra Mozzi
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842 Bosisio Parini, Italy
| | - Diego Forni
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842 Bosisio Parini, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, 20090 Milan, Italy
- Don C. Gnocchi Foundation ONLUS, IRCCS, 20100 Milan, Italy
| | - Uberto Pozzoli
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842 Bosisio Parini, Italy
| | - Sara Mascheretti
- Child Psychopathology Unit, Scientific Institute IRCCS E. MEDEA, 23842 Bosisio Parini, Lecco, Italy
| | | | - Stefania Riva
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842 Bosisio Parini, Italy
| | - Nereo Bresolin
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842 Bosisio Parini, Italy
- Dino Ferrari Centre, Department of Physiopathology and Transplantation, University of Milan, Fondazione Ca’ Granda IRCCS Ospedale Maggiore Policlinico, 20122 Milan, Italy
| | - Rachele Cagliani
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842 Bosisio Parini, Italy
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842 Bosisio Parini, Italy
| |
Collapse
|
39
|
Matsumoto T, John A, Baeza-Centurion P, Li B, Akashi H. Codon Usage Selection Can Bias Estimation of the Fraction of Adaptive Amino Acid Fixations. Mol Biol Evol 2016; 33:1580-9. [PMID: 26873577 DOI: 10.1093/molbev/msw027] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
A growing number of molecular evolutionary studies are estimating the proportion of adaptive amino acid substitutions (α) from comparisons of ratios of polymorphic and fixed DNA mutations. Here, we examine how violations of two of the model assumptions, neutral evolution of synonymous mutations and stationary base composition, affect α estimation. We simulated the evolution of coding sequences assuming weak selection on synonymous codon usage bias and neutral protein evolution, α = 0. We show that weak selection on synonymous mutations can give polymorphism/divergence ratios that yield α-hat (estimated α) considerably larger than its true value. Nonstationary evolution (changes in population size, selection, or mutation) can exacerbate such biases or, in some scenarios, give biases in the opposite direction, α-hat < α. These results demonstrate that two factors that appear to be prevalent among taxa, weak selection on synonymous mutations and non-steady-state nucleotide composition, should be considered when estimating α. Estimates of the proportion of adaptive amino acid fixations from large-scale analyses of Drosophila melanogaster polymorphism and divergence data are positively correlated with codon usage bias. Such patterns are consistent with α-hat inflation from weak selection on synonymous mutations and/or mutational changes within the examined gene trees.
Collapse
Affiliation(s)
- Tomotaka Matsumoto
- Division of Evolutionary Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan
| | - Anoop John
- Division of Evolutionary Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan
| | - Pablo Baeza-Centurion
- Division of Evolutionary Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan
| | - Boyang Li
- Division of Evolutionary Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan
| | - Hiroshi Akashi
- Division of Evolutionary Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan Department of Genetics, The Graduate University for Advanced Studies (SOKENDAI), Yata, Mishima, Shizuoka, Japan
| |
Collapse
|
40
|
Forni D, Mozzi A, Pontremoli C, Vertemara J, Pozzoli U, Biasin M, Bresolin N, Clerici M, Cagliani R, Sironi M. Diverse selective regimes shape genetic diversity at ADAR genes and at their coding targets. RNA Biol 2015; 12:149-61. [PMID: 25826567 DOI: 10.1080/15476286.2015.1017215] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
A-to-I RNA editing operated by ADAR enzymes is extremely common in mammals. Several editing events in coding regions have pivotal physiological roles and affect protein sequence (recoding events) or function. We analyzed the evolutionary history of the 3 ADAR family genes and of their coding targets. Evolutionary analysis indicated that ADAR evolved adaptively in primates, with the strongest selection in the unique N-terminal domain of the interferon-inducible isoform. Positively selected residues in the human lineage were also detected in the ADAR deaminase domain and in the RNA binding domains of ADARB1 and ADARB2. During the recent history of human populations distinct variants in the 3 genes increased in frequency as a result of local selective pressures. Most selected variants are located within regulatory regions and some are in linkage disequilibrium with eQTLs in monocytes. Finally, analysis of conservation scores of coding editing sites indicated that editing events are counter-selected within regions that are poorly tolerant to change. Nevertheless, a minority of recoding events occurs at highly conserved positions and possibly represents the functional fraction. These events are enriched in pathways related to HIV-1 infection and to epidermis/hair development. Thus, both ADAR genes and their targets evolved under variable selective regimes, including purifying and positive selection. Pressures related to immune response likely represented major drivers of evolution for ADAR genes. As for their coding targets, we suggest that most editing events are slightly deleterious, although a minority may be beneficial and contribute to antiviral response and skin homeostasis.
Collapse
Key Words
- 1000G,1000 Genomes Pilot Project
- A to I, adenosine to inosine
- A-to-I editing
- ADAR
- ADAR editing sites
- AGS, Aicardi-Goutières Syndrome
- BEB, Bayes Empirical Bayes
- BS-REL, branch site-random effects likelihood
- CEU, Europeans
- CHBJPT, Chinese plus Japanese
- DAF, derived allele frequency
- DIND, Derived Intra-allelic Nucleotide Diversity
- DSH, dyschromatosis symmetrica hereditaria
- FDR, false discovery rate
- GARD, Genetic Algorithm Recombination Detection
- GERP Genomic Evolutionary Rate Profiling
- IFN, Interferon
- LD, linkage disequilibrium
- LRT, likelihood ratio test
- MAF, minor allele frequency
- MEME, Mixed Effects Model of Evolution
- RBD, dsRNA binding domain
- SLAC, single-likelihood ancestor counting
- YRI, Yoruba
- eQTL, Expression quantitative trait loci
- evolutionary analysis
- iHS, Integrated Haplotype Score
- positive selection
Collapse
Affiliation(s)
- Diego Forni
- a Bioinformatics ; Scientific Institute IRCCS E. MEDEA ; Bosisio Parini , Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Forni D, Pontremoli C, Cagliani R, Pozzoli U, Clerici M, Sironi M. Positive selection underlies the species-specific binding of Plasmodium falciparum RH5 to human basigin. Mol Ecol 2015; 24:4711-22. [PMID: 26302433 DOI: 10.1111/mec.13354] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Revised: 08/04/2015] [Accepted: 08/19/2015] [Indexed: 12/12/2022]
Abstract
Plasmodium falciparum, the causative agent of the deadliest form of malaria, is a member of the Laverania subgenus, which includes ape-infecting parasites. P. falciparum is thought to have originated in gorillas, although infection is now restricted to humans. Laverania parasites display remarkable host-specificity, which is partially mediated by the interaction between parasite ligands and host receptors. We analyse the evolution of BSG (basigin) and GYPA (glycophorin A) in primates/hominins, as well as of their Plasmodium-encoded ligands, PfRH5 and PfEBA175. We show that, in primates, positive selection targeted two sites in BSG (F27 and H102), both involved in PfRH5 binding. A population genetics-phylogenetics approach detected the strongest selection for the gorilla lineage: one of the positively selected sites (K191) is a major determinant of PfRH5 binding affinity. Analysis of RH5 genes indicated episodic selection on the P. falciparum branch; the positively selected W447 site is known to stabilize the interaction with human basigin. Conversely, we detect no selection in the receptor-binding region of EBA175 in the P. falciparum lineage. Its host receptor, GYPA, shows evidence of positive selection in all hominid lineages; selected codons include glycosylation sites that modulate PfEBA175 binding affinity. Data herein provide an evolutionary explanation for species-specific binding of the PfRH5-BSG ligand-receptor pair and support the hypothesis that positive selection at these genes drove the host shift leading to the emergence of P. falciparum as a human pathogen.
Collapse
Affiliation(s)
- Diego Forni
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, 23842, Bosisio Parini, Italy
| | - Chiara Pontremoli
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, 23842, Bosisio Parini, Italy
| | - Rachele Cagliani
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, 23842, Bosisio Parini, Italy
| | - Uberto Pozzoli
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, 23842, Bosisio Parini, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, 20090, Milan, Italy.,Don C. Gnocchi Foundation ONLUS, IRCCS, 20148, Milan, Italy
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, 23842, Bosisio Parini, Italy
| |
Collapse
|
42
|
Pontremoli C, Mozzi A, Forni D, Cagliani R, Pozzoli U, Menozzi G, Vertemara J, Bresolin N, Clerici M, Sironi M. Natural Selection at the Brush-Border: Adaptations to Carbohydrate Diets in Humans and Other Mammals. Genome Biol Evol 2015; 7:2569-84. [PMID: 26319403 PMCID: PMC4607523 DOI: 10.1093/gbe/evv166] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Dietary shifts can drive molecular evolution in mammals and a major transition in human history, the agricultural revolution, favored carbohydrate consumption. We investigated the evolutionary history of nine genes encoding brush-border proteins involved in carbohydrate digestion/absorption. Results indicated widespread adaptive evolution in mammals, with several branches experiencing episodic selection, particularly strong in bats. Many positively selected sites map to functional protein regions (e.g., within glucosidase catalytic crevices), with parallel evolution at SI (sucrase-isomaltase) and MGAM (maltase-glucoamylase). In human populations, five genes were targeted by positive selection acting on noncoding variants within regulatory elements. Analysis of ancient DNA samples indicated that most derived alleles were already present in the Paleolithic. Positively selected variants at SLC2A5 (fructose transporter) were an exception and possibly spread following the domestication of specific fruit crops. We conclude that agriculture determined no major selective event at carbohydrate metabolism genes in humans, with implications for susceptibility to metabolic disorders.
Collapse
Affiliation(s)
- Chiara Pontremoli
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Alessandra Mozzi
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Diego Forni
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Rachele Cagliani
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Uberto Pozzoli
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Giorgia Menozzi
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Jacopo Vertemara
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Nereo Bresolin
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy Dino Ferrari Centre, Department of Physiopathology and Transplantation, University of Milan, Fondazione Ca' Granda IRCCS Ospedale Maggiore Policlinico, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, Italy Don C. Gnocchi Foundation ONLUS, IRCCS, Milan, Italy
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| |
Collapse
|
43
|
Fijarczyk A, Babik W. Detecting balancing selection in genomes: limits and prospects. Mol Ecol 2015; 24:3529-45. [DOI: 10.1111/mec.13226] [Citation(s) in RCA: 144] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Revised: 04/27/2015] [Accepted: 04/30/2015] [Indexed: 12/17/2022]
Affiliation(s)
- Anna Fijarczyk
- Institute of Environmental Sciences; Jagiellonian University; Gronostajowa 7 30-387 Kraków Poland
| | - Wiesław Babik
- Institute of Environmental Sciences; Jagiellonian University; Gronostajowa 7 30-387 Kraków Poland
| |
Collapse
|
44
|
Vieira FG, Lassalle F, Korneliussen TS, Fumagalli M. Improving the estimation of genetic distances from Next-Generation Sequencing data. Biol J Linn Soc Lond 2015. [DOI: 10.1111/bij.12511] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Filipe G. Vieira
- Centre for GeoGenetics and Evogenomics Section; Natural History Museum of Denmark; University of Copenhagen; DK-2100 Copenhagen Denmark
| | - Florent Lassalle
- Department of Genetics, Evolution and Environment; UCL Genetics Institute; University College London; Gower Street London WC1E 6BT UK
| | - Thorfinn S. Korneliussen
- Centre for GeoGenetics and Evogenomics Section; Natural History Museum of Denmark; University of Copenhagen; DK-2100 Copenhagen Denmark
| | - Matteo Fumagalli
- Department of Genetics, Evolution and Environment; UCL Genetics Institute; University College London; Gower Street London WC1E 6BT UK
| |
Collapse
|
45
|
Mozzi A, Pontremoli C, Forni D, Clerici M, Pozzoli U, Bresolin N, Cagliani R, Sironi M. OASes and STING: adaptive evolution in concert. Genome Biol Evol 2015; 7:1016-32. [PMID: 25752600 PMCID: PMC4419793 DOI: 10.1093/gbe/evv046] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
OAS (2′–5′-oligoadenylate synthases) proteins and cyclic GMP–AMP synthase (cGAS, gene symbol: MB21D1) patrol the cytoplasm for the presence of foreign nucleic acids. Upon binding to double-stranded RNA or double-stranded DNA, OAS proteins and cGAS produce nucleotide second messengers to activate RNase L and STING (stimulator of interferon genes, gene symbol: TMEM173), respectively; this leads to the initiation of antiviral responses. We analyzed the evolutionary history of the MB21D1–TMEM173 and OAS–RNASEL axes in primates and bats and found evidence of widespread positive selection in both orders. In TMEM173, residue 230, a major determinant of response to natural ligands and to mimetic drugs (e.g., DMXAA), was positively selected in Primates and Chiroptera. In both orders, selection also targeted an α-helix/loop element in RNase L that modulates the enzyme preference for single-stranded RNA versus stem loops. Analysis of positively selected sites in OAS1, OAS2, and MB21D1 revealed parallel evolution, with the corresponding residues being selected in different genes. As this cannot result from gene conversion, these data suggest that selective pressure acting on OAS and MB21D1 genes is related to nucleic acid recognition and to the specific mechanism of enzyme activation, which requires a conformational change. Finally, a population genetics-phylogenetics analysis in humans, chimpanzees, and gorillas detected several positively selected sites in most genes. Data herein shed light into species-specific differences in infection susceptibility and in response to synthetic compounds, with relevance for the design of synthetic compounds as vaccine adjuvants.
Collapse
Affiliation(s)
- Alessandra Mozzi
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Chiara Pontremoli
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Diego Forni
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Mario Clerici
- Department of Physiopathology and Transplantation, University of Milan, Italy Don C. Gnocchi Foundation ONLUS, IRCCS, Milan, Italy
| | - Uberto Pozzoli
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Nereo Bresolin
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy Department of Physiopathology and Transplantation, Dino Ferrari Centre, University of Milan, Fondazione Ca' Granda IRCCS Ospedale Maggiore Policlinico, Milan, Italy
| | - Rachele Cagliani
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| |
Collapse
|
46
|
Cagliani R, Forni D, Biasin M, Comabella M, Guerini FR, Riva S, Pozzoli U, Agliardi C, Caputo D, Malhotra S, Montalban X, Bresolin N, Clerici M, Sironi M. Ancient and recent selective pressures shaped genetic diversity at AIM2-like nucleic acid sensors. Genome Biol Evol 2015; 6:830-45. [PMID: 24682156 PMCID: PMC4007548 DOI: 10.1093/gbe/evu066] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
AIM2-like receptors (ALRs) are a family of nucleic acid sensors essential for innate immune responses against viruses and bacteria. We performed an evolutionary analysis of ALR genes (MNDA, PYHIN1, IFI16, and AIM2) by analyzing inter- and intraspecies diversity. Maximum-likelihood analyses indicated that IFI16 and AIM2 evolved adaptively in primates, with branch-specific selection at the catarrhini lineage for IFI16. Application of a population genetics–phylogenetics approach also allowed identification of positive selection events in the human lineage. Positive selection in primates targeted sites located at the DNA-binding interface in both IFI16 and AIM2. In IFI16, several sites positively selected in primates and in the human lineage were located in the PYD domain, which is involved in protein–protein interaction and is bound by a human cytomegalovirus immune evasion protein. Finally, positive selection was found to target nuclear localization signals in IFI16 and the spacer region separating the two HIN domains. Population genetic analysis in humans revealed that an IFI16 genic region has been a target of long-standing balancing selection, possibly acting on two nonsynonymous polymorphisms located in the spacer region. Data herein indicate that ALRs have been repeatedly targeted by natural selection. The balancing selection region in IFI16 carries a variant with opposite risk effect for distinct autoimmune diseases, suggesting antagonistic pleiotropy. We propose that the underlying scenario is the result of an ancestral and still ongoing host–pathogen arms race and that the maintenance of susceptibility alleles for autoimmune diseases at IFI16 represents an evolutionary trade-off.
Collapse
Affiliation(s)
- Rachele Cagliani
- Bioinformatics Laboratory, Scientific Institute IRCCS E. Medea, Bosisio Parini (LC), Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Gulko B, Hubisz MJ, Gronau I, Siepel A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet 2015; 47:276-83. [PMID: 25599402 PMCID: PMC4342276 DOI: 10.1038/ng.3196] [Citation(s) in RCA: 182] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 12/19/2014] [Indexed: 12/17/2022]
Abstract
We describe a novel computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These fitness consequence (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct “fingerprints” based on high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types based on public data from ENCODE. Compared with conventional conservation scores, fitCons scores show considerably improved prediction power for cis-regulatory elements. In addition, fitCons scores indicate that 4.2–7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and they suggest that recent evolutionary turnover has had limited impact on the functional content of the genome.
Collapse
Affiliation(s)
- Brad Gulko
- Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- 1] Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA. [2] Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
48
|
Babik W, Dudek K, Fijarczyk A, Pabijan M, Stuglik M, Szkotak R, Zieliński P. Constraint and adaptation in newt toll-like receptor genes. Genome Biol Evol 2014; 7:81-95. [PMID: 25480684 PMCID: PMC4316619 DOI: 10.1093/gbe/evu266] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Acute die-offs of amphibian populations worldwide have been linked to the emergence of viral and fungal diseases. Inter and intraspecific immunogenetic differences may influence the outcome of infection. Toll-like receptors (TLRs) are an essential component of innate immunity and also prime acquired defenses. We report the first comprehensive assessment of TLR gene variation for urodele amphibians. The Lissotriton newt TLR repertoire includes representatives of 13 families and is compositionally most similar to that of the anuran Xenopus. Both ancient and recent gene duplications have occurred in urodeles, bringing the total number of TLR genes to at least 21. Purifying selection has predominated the evolution of newt TLRs in both long (∼70 Ma) and medium (∼18 Ma) timescales. However, we find evidence for both purifying and positive selection acting on TLRs in two recently diverged (2-5 Ma) allopatric evolutionary lineages (Lissotriton montandoni and L. vulgaris graecus). Overall, both forms of selection have been stronger in L. v. graecus, while constraint on most TLR genes in L. montandoni appears relaxed. The differences in selection regimes are unlikely to be biased by demographic effects because these were controlled by means of a historical demographic model derived from an independent data set of 62 loci. We infer that TLR genes undergo distinct trajectories of adaptive evolution in closely related amphibian lineages, highlight the potential of TLRs to capture the signatures of different assemblages of pathogenic microorganisms, and suggest differences between lineages in the relative roles of innate and acquired immunity.
Collapse
Affiliation(s)
- Wiesław Babik
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| | - Katarzyna Dudek
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| | - Anna Fijarczyk
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| | - Maciej Pabijan
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| | - Michał Stuglik
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| | - Rafał Szkotak
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| | - Piotr Zieliński
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| |
Collapse
|
49
|
Approximation to the distribution of fitness effects across functional categories in human segregating polymorphisms. PLoS Genet 2014; 10:e1004697. [PMID: 25375159 PMCID: PMC4222666 DOI: 10.1371/journal.pgen.1004697] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 08/22/2014] [Indexed: 02/03/2023] Open
Abstract
Quantifying the proportion of polymorphic mutations that are deleterious or neutral is of fundamental importance to our understanding of evolution, disease genetics and the maintenance of variation genome-wide. Here, we develop an approximation to the distribution of fitness effects (DFE) of segregating single-nucleotide mutations in humans. Unlike previous methods, we do not assume that synonymous mutations are neutral or not strongly selected, and we do not rely on fitting the DFE of all new nonsynonymous mutations to a single probability distribution, which is poorly motivated on a biological level. We rely on a previously developed method that utilizes a variety of published annotations (including conservation scores, protein deleteriousness estimates and regulatory data) to score all mutations in the human genome based on how likely they are to be affected by negative selection, controlling for mutation rate. We map this and other conservation scores to a scale of fitness coefficients via maximum likelihood using diffusion theory and a Poisson random field model on SNP data. Our method serves to approximate the deleterious DFE of mutations that are segregating, regardless of their genomic consequence. We can then compare the proportion of mutations that are negatively selected or neutral across various categories, including different types of regulatory sites. We observe that the distribution of intergenic polymorphisms is highly peaked at neutrality, while the distribution of nonsynonymous polymorphisms has a second peak at [Formula: see text]. Other types of polymorphisms have shapes that fall roughly in between these two. We find that transcriptional start sites, strong CTCF-enriched elements and enhancers are the regulatory categories with the largest proportion of deleterious polymorphisms.
Collapse
|
50
|
Mozzi A, Forni D, Cagliani R, Pozzoli U, Vertemara J, Bresolin N, Sironi M. Albuminoid genes: evolving at the interface of dispensability and selection. Genome Biol Evol 2014; 6:2983-97. [PMID: 25349266 PMCID: PMC4255767 DOI: 10.1093/gbe/evu235] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The albuminoid gene family comprises vitamin D-binding protein (GC), alpha-fetoprotein (AFP), afamin (AFM), and albumin (ALB). Albumin is the most abundant human serum protein, and, as the other family members, acts as a transporter of endogenous and exogenous substances including thyroxine, fatty acids, and drugs. Instead, the major cargo of GC is 25-hydroxyvitamin D. We performed an evolutionary study of albuminoid genes and we show that ALB evolved adaptively in mammals. Most positively selected sites are located within albumin-binding sites for fatty acids and thyroxine, as well as at the contact surface with neonatal Fc receptor. Positive selection was also detected for residues forming the prostaglandin-binding pocket. Adaptation to hibernation/torpor might explain the signatures of episodic positive selection we detected for few mammalian lineages. Application of a population genetics-phylogenetics approach showed that purifying selection represented a major force acting on albuminoid genes in both humans and chimpanzees, with the strongest constraint observed for human GC. Population genetic analysis revealed that GC was also the target of locally exerted selective pressure, which drove the frequency increase of different haplotypes in distinct human populations. A search for known variants that modulate GC and 25-hydroxyvitamin D concentrations revealed linkage disequilibrium with positively selected variants, although European and Asian major GC haplotypes carry alleles with reported opposite effect on GC concentration. Data herein indicate that albumin, an extremely abundant housekeeping protein, was the target of pervasive and episodic selection in mammals, whereas GC represented a selection target during the recent evolution of human populations.
Collapse
Affiliation(s)
- Alessandra Mozzi
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Diego Forni
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Rachele Cagliani
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Uberto Pozzoli
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Jacopo Vertemara
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| | - Nereo Bresolin
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy Dino Ferrari Centre, Department of Physiopathology and Transplantation, University of Milan, Fondazione Ca' Granda IRCCS Ospedale Maggiore Policlinico, Milano, Italy
| | - Manuela Sironi
- Bioinformatics, Scientific Institute IRCCS E.MEDEA, Bosisio Parini, Italy
| |
Collapse
|