1
|
Yan H, Hu Z, Thomas GWC, Edwards SV, Sackton TB, Liu JS. PhyloAcc-GT: A Bayesian Method for Inferring Patterns of Substitution Rate Shifts on Targeted Lineages Accounting for Gene Tree Discordance. Mol Biol Evol 2023; 40:msad195. [PMID: 37665177 PMCID: PMC10540510 DOI: 10.1093/molbev/msad195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 08/15/2023] [Accepted: 09/01/2023] [Indexed: 09/05/2023] Open
Abstract
An important goal of evolutionary genomics is to identify genomic regions whose substitution rates differ among lineages. For example, genomic regions experiencing accelerated molecular evolution in some lineages may provide insight into links between genotype and phenotype. Several comparative genomics methods have been developed to identify genomic accelerations between species, including a Bayesian method called PhyloAcc, which models shifts in substitution rate in multiple target lineages on a phylogeny. However, few methods consider the possibility of discordance between the trees of individual loci and the species tree due to incomplete lineage sorting, which might cause false positives. Here, we present PhyloAcc-GT, which extends PhyloAcc by modeling gene tree heterogeneity. Given a species tree, we adopt the multispecies coalescent model as the prior distribution of gene trees, use Markov chain Monte Carlo (MCMC) for inference, and design novel MCMC moves to sample gene trees efficiently. Through extensive simulations, we show that PhyloAcc-GT outperforms PhyloAcc and other methods in identifying target lineage-specific accelerations and detecting complex patterns of rate shifts, and is robust to specification of population size parameters. PhyloAcc-GT is usually more conservative than PhyloAcc in calling convergent rate shifts because it identifies more accelerations on ancestral than on terminal branches. We apply PhyloAcc-GT to two examples of convergent evolution: flightlessness in ratites and marine mammal adaptations, and show that PhyloAcc-GT is a robust tool to identify shifts in substitution rate associated with specific target lineages while accounting for incomplete lineage sorting.
Collapse
Affiliation(s)
- Han Yan
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Zhirui Hu
- Department of Statistics, Harvard University, Cambridge, MA, USA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
| | | | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | | | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, MA, USA
| |
Collapse
|
2
|
Molecular evolution and the decline of purifying selection with age. Nat Commun 2021; 12:2657. [PMID: 33976227 PMCID: PMC8113359 DOI: 10.1038/s41467-021-22981-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 04/06/2021] [Indexed: 12/18/2022] Open
Abstract
Life history theory predicts that the intensity of selection declines with age, and this trend should impact how genes expressed at different ages evolve. Here we find consistent relationships between a gene’s age of expression and patterns of molecular evolution in two mammals (the human Homo sapiens and the mouse Mus musculus) and two insects (the malaria mosquito Anopheles gambiae and the fruit fly Drosophila melanogaster). When expressed later in life, genes fix nonsynonymous mutations more frequently, are more polymorphic for nonsynonymous mutations, and have shorter evolutionary lifespans, relative to those expressed early. The latter pattern is explained by a simple evolutionary model. Further, early-expressed genes tend to be enriched in similar gene ontology terms across species, while late-expressed genes show no such consistency. In humans, late-expressed genes are more likely to be linked to cancer and to segregate for dominant disease-causing mutations. Last, the effective strength of selection (Nes) decreases and the fraction of beneficial mutations increases with a gene’s age of expression. These results are consistent with the diminishing efficacy of purifying selection with age, as proposed by Medawar’s classic hypothesis for the evolution of senescence, and provide links between life history theory and molecular evolution. A fundamental principle of evolutionary theory is that the force of natural selection is weaker on traits expressed late in life relative to traits expressed early. Here, the authors find strong and consistent patterns of molecular evolution reflecting this principle in four species of animals, including humans.
Collapse
|
3
|
Barba-Montoya J, Tao Q, Kumar S. Using a GTR+Γ substitution model for dating sequence divergence when stationarity and time-reversibility assumptions are violated. Bioinformatics 2021; 36:i884-i894. [PMID: 33381826 DOI: 10.1093/bioinformatics/btaa820] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2020] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION As the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates. RESULTS We quantified the bias on time estimates that resulted from using the GTR + Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR + Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR + Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR + Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations. AVAILABILITY AND IMPLEMENTATION All datasets are deposited in Figshare: https://doi.org/10.6084/m9.figshare.12594638.
Collapse
Affiliation(s)
- Jose Barba-Montoya
- Institute for Genomics and Evolutionary Medicine.,Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Qiqing Tao
- Institute for Genomics and Evolutionary Medicine.,Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine.,Department of Biology, Temple University, Philadelphia, PA 19122, USA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
4
|
Kapun M, Barrón MG, Staubach F, Obbard DJ, Wiberg RAW, Vieira J, Goubert C, Rota-Stabelli O, Kankare M, Bogaerts-Márquez M, Haudry A, Waidele L, Kozeretska I, Pasyukova EG, Loeschcke V, Pascual M, Vieira CP, Serga S, Montchamp-Moreau C, Abbott J, Gibert P, Porcelli D, Posnien N, Sánchez-Gracia A, Grath S, Sucena É, Bergland AO, Guerreiro MPG, Onder BS, Argyridou E, Guio L, Schou MF, Deplancke B, Vieira C, Ritchie MG, Zwaan BJ, Tauber E, Orengo DJ, Puerma E, Aguadé M, Schmidt P, Parsch J, Betancourt AJ, Flatt T, González J. Genomic Analysis of European Drosophila melanogaster Populations Reveals Longitudinal Structure, Continent-Wide Selection, and Previously Unknown DNA Viruses. Mol Biol Evol 2020; 37:2661-2678. [PMID: 32413142 PMCID: PMC7475034 DOI: 10.1093/molbev/msaa120] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Genetic variation is the fuel of evolution, with standing genetic variation especially important for short-term evolution and local adaptation. To date, studies of spatiotemporal patterns of genetic variation in natural populations have been challenging, as comprehensive sampling is logistically difficult, and sequencing of entire populations costly. Here, we address these issues using a collaborative approach, sequencing 48 pooled population samples from 32 locations, and perform the first continent-wide genomic analysis of genetic variation in European Drosophila melanogaster. Our analyses uncover longitudinal population structure, provide evidence for continent-wide selective sweeps, identify candidate genes for local climate adaptation, and document clines in chromosomal inversion and transposable element frequencies. We also characterize variation among populations in the composition of the fly microbiome, and identify five new DNA viruses in our samples.
Collapse
Affiliation(s)
- Martin Kapun
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Department of Evolutionary Biology and Environmental Sciences, University of Zürich, Zürich, Switzerland
- Division of Cell and Developmental Biology, Medical University of Vienna, Vienna, Austria
| | - Maite G Barrón
- The European Drosophila Population Genomics Consortium (DrosEU)
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain
| | - Fabian Staubach
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Evolutionary Biology and Ecology, University of Freiburg, Freiburg, Germany
| | - Darren J Obbard
- The European Drosophila Population Genomics Consortium (DrosEU)
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - R Axel W Wiberg
- The European Drosophila Population Genomics Consortium (DrosEU)
- Centre for Biological Diversity, School of Biology, University of St. Andrews, St Andrews, Scotland
- Department of Environmental Sciences, Zoological Institute, University of Basel, Basel, Switzerland
| | - Jorge Vieira
- The European Drosophila Population Genomics Consortium (DrosEU)
- Instituto de Biologia Molecular e Celular (IBMC), University of Porto, Porto, Portugal
- Instituto de Investigação e Inovação em Saúde (I3S), University of Porto, Porto, Portugal
| | - Clément Goubert
- The European Drosophila Population Genomics Consortium (DrosEU)
- Laboratoire de Biométrie et Biologie Evolutive UMR 5558, CNRS, Université Lyon 1, Université de Lyon, Villeurbanne, France
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY
| | - Omar Rota-Stabelli
- The European Drosophila Population Genomics Consortium (DrosEU)
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’ Adige, Italy
| | - Maaria Kankare
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland
| | - María Bogaerts-Márquez
- The European Drosophila Population Genomics Consortium (DrosEU)
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain
| | - Annabelle Haudry
- The European Drosophila Population Genomics Consortium (DrosEU)
- Laboratoire de Biométrie et Biologie Evolutive UMR 5558, CNRS, Université Lyon 1, Université de Lyon, Villeurbanne, France
| | - Lena Waidele
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Evolutionary Biology and Ecology, University of Freiburg, Freiburg, Germany
| | - Iryna Kozeretska
- The European Drosophila Population Genomics Consortium (DrosEU)
- General and Medical Genetics Department, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
- State Institution National Antarctic Scientific Center of Ministry of Education and Science of Ukraine, Kyiv, Ukraine
| | - Elena G Pasyukova
- The European Drosophila Population Genomics Consortium (DrosEU)
- Laboratory of Genome Variation, Institute of Molecular Genetics of RAS, Moscow, Russia
| | - Volker Loeschcke
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Bioscience—Genetics, Ecology and Evolution, Aarhus University, Aarhus C, Denmark
| | - Marta Pascual
- The European Drosophila Population Genomics Consortium (DrosEU)
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Cristina P Vieira
- The European Drosophila Population Genomics Consortium (DrosEU)
- Instituto de Biologia Molecular e Celular (IBMC), University of Porto, Porto, Portugal
- Instituto de Investigação e Inovação em Saúde (I3S), University of Porto, Porto, Portugal
| | - Svitlana Serga
- The European Drosophila Population Genomics Consortium (DrosEU)
- General and Medical Genetics Department, Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
| | - Catherine Montchamp-Moreau
- The European Drosophila Population Genomics Consortium (DrosEU)
- Université Paris-Saclay, CNRS, IRD, UMR Évolution, Génomes, Comportement et Écologie, 91198, Gif-sur-Yvette, France
| | - Jessica Abbott
- The European Drosophila Population Genomics Consortium (DrosEU)
- Section for Evolutionary Ecology, Department of Biology, Lund University, Lund, Sweden
| | - Patricia Gibert
- The European Drosophila Population Genomics Consortium (DrosEU)
- Laboratoire de Biométrie et Biologie Evolutive UMR 5558, CNRS, Université Lyon 1, Université de Lyon, Villeurbanne, France
| | - Damiano Porcelli
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Animal and Plant Sciences, Sheffield, United Kingdom
| | - Nico Posnien
- The European Drosophila Population Genomics Consortium (DrosEU)
- Johann-Friedrich-Blumenbach-Institut für Zoologie und Anthropologie, Universität Göttingen, Göttingen, Germany
| | - Alejandro Sánchez-Gracia
- The European Drosophila Population Genomics Consortium (DrosEU)
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Sonja Grath
- The European Drosophila Population Genomics Consortium (DrosEU)
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg, Germany
| | - Élio Sucena
- The European Drosophila Population Genomics Consortium (DrosEU)
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- Departamento de Biologia Animal, Faculdade de Ciências da Universidade de Lisboa, Lisboa, Portugal
| | - Alan O Bergland
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Biology, University of Virginia, Charlottesville, VA
| | - Maria Pilar Garcia Guerreiro
- The European Drosophila Population Genomics Consortium (DrosEU)
- Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Banu Sebnem Onder
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Biology, Faculty of Science, Hacettepe University, Ankara, Turkey
| | - Eliza Argyridou
- The European Drosophila Population Genomics Consortium (DrosEU)
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg, Germany
| | - Lain Guio
- The European Drosophila Population Genomics Consortium (DrosEU)
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain
| | - Mads Fristrup Schou
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Bioscience—Genetics, Ecology and Evolution, Aarhus University, Aarhus C, Denmark
- Section for Evolutionary Ecology, Department of Biology, Lund University, Lund, Sweden
| | - Bart Deplancke
- The European Drosophila Population Genomics Consortium (DrosEU)
- Institute of Bio-engineering, School of Life Sciences, EPFL, Lausanne, Switzerland
| | - Cristina Vieira
- The European Drosophila Population Genomics Consortium (DrosEU)
- Laboratoire de Biométrie et Biologie Evolutive UMR 5558, CNRS, Université Lyon 1, Université de Lyon, Villeurbanne, France
| | - Michael G Ritchie
- The European Drosophila Population Genomics Consortium (DrosEU)
- Centre for Biological Diversity, School of Biology, University of St. Andrews, St Andrews, Scotland
| | - Bas J Zwaan
- The European Drosophila Population Genomics Consortium (DrosEU)
- Laboratory of Genetics, Department of Plant Sciences, Wageningen University, Wageningen, Netherlands
| | - Eran Tauber
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
- Institute of Evolution, University of Haifa, Haifa, Israel
| | - Dorcas J Orengo
- The European Drosophila Population Genomics Consortium (DrosEU)
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Eva Puerma
- The European Drosophila Population Genomics Consortium (DrosEU)
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Montserrat Aguadé
- The European Drosophila Population Genomics Consortium (DrosEU)
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Paul Schmidt
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Biology, University of Pennsylvania, Philadelphia, PA
| | - John Parsch
- The European Drosophila Population Genomics Consortium (DrosEU)
- Division of Evolutionary Biology, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg, Germany
| | - Andrea J Betancourt
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Evolution, Ecology, and Behaviour, University of Liverpool, Liverpool, United Kingdom
| | - Thomas Flatt
- The European Drosophila Population Genomics Consortium (DrosEU)
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
- Department of Biology, University of Fribourg, Fribourg, Switzerland
| | - Josefa González
- The European Drosophila Population Genomics Consortium (DrosEU)
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
5
|
Guillén Y, Casillas S, Ruiz A. Genome-Wide Patterns of Sequence Divergence of Protein-Coding Genes Between Drosophila buzzatii and D. mojavensis. J Hered 2019; 110:92-101. [PMID: 30124907 DOI: 10.1093/jhered/esy041] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 08/14/2018] [Indexed: 12/15/2022] Open
Abstract
Evolutionary rates for protein-coding genes are determined not only by natural selection but also by multiple genomic factors including mutation rates, recombination, gene expression levels, and chromosomal location. To investigate the joint effects of different genomic determinants on protein evolution, we compared the coding sequences of 9017 single-copy orthologs between 2 cactophilic species from the Drosophila subgenus, Drosophila mojavensis and D. buzzatii, whose genomes have been previously sequenced. We assessed the impact of 7 genomic determinants, that is, chromosome type, recombination, chromosomal inversions, expression breadth, expression level, gene length, and the number of exons, on divergence rates of protein-coding genes to understand patterns of evolutionary variation. Integrative analysis of these factors revealed that 1) X-linked and autosomal genes evolve at significantly different rates in agreement with the faster-X hypothesis, 2) genes located on the dot chromosome and pericentromeric regions have higher divergence rates, 3) genes located at chromosomes with more fixed inversions have higher pairwise divergence than those located at nearly collinear chromosomes, and 4) gene expression patterns can be considered the strongest determinant of protein evolution. In addition, the number of exons and protein length had a significant effect on pairwise divergence at synonymous sites. All in all, our results show the relative importance of each genomic factor on the rates of protein evolution and functional constraint in these 2 cactophilic Drosophila species.
Collapse
Affiliation(s)
- Yolanda Guillén
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Sònia Casillas
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain.,The Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | - Alfredo Ruiz
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| |
Collapse
|
6
|
Distinguishing Among Evolutionary Forces Acting on Genome-Wide Base Composition: Computer Simulation Analysis of Approximate Methods for Inferring Site Frequency Spectra of Derived Mutations. G3-GENES GENOMES GENETICS 2018; 8:1755-1769. [PMID: 29588382 PMCID: PMC5940166 DOI: 10.1534/g3.117.300512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Inferred ancestral nucleotide states are increasingly employed in analyses of within- and between -species genome variation. Although numerous studies have focused on ancestral inference among distantly related lineages, approaches to infer ancestral states in polymorphism data have received less attention. Recently developed approaches that employ complex transition matrices allow us to infer ancestral nucleotide sequence in various evolutionary scenarios of base composition. However, the requirement of a single gene tree to calculate a likelihood is an important limitation for conducting ancestral inference using within-species variation in recombining genomes. To resolve this problem, and to extend the applicability of ancestral inference in studies of base composition evolution, we first evaluate three previously proposed methods to infer ancestral nucleotide sequences among within- and between-species sequence variation data. The methods employ a single allele, bifurcating tree, or a star tree for within-species variation data. Using simulated nucleotide sequences, we employ ancestral inference to infer fixations and polymorphisms. We find that all three methods show biased inference. We modify the bifurcating tree method to include weights to adjust for an expected site frequency spectrum, “bifurcating tree with weighting” (BTW). Our simulation analysis show that the BTW method can substantially improve the reliability and robustness of ancestral inference in a range of scenarios that include non-neutral and/or non-stationary base composition evolution.
Collapse
|
7
|
Jackson BC, Campos JL, Haddrill PR, Charlesworth B, Zeng K. Variation in the Intensity of Selection on Codon Bias over Time Causes Contrasting Patterns of Base Composition Evolution in Drosophila. Genome Biol Evol 2017; 9:102-123. [PMID: 28082609 PMCID: PMC5381600 DOI: 10.1093/gbe/evw291] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/07/2016] [Indexed: 12/11/2022] Open
Abstract
Four-fold degenerate coding sites form a major component of the genome, and are often used to make inferences about selection and demography, so that understanding their evolution is important. Despite previous efforts, many questions regarding the causes of base composition changes at these sites in Drosophila remain unanswered. To shed further light on this issue, we obtained a new whole-genome polymorphism data set from D. simulans. We analyzed samples from the putatively ancestral range of D. simulans, as well as an existing polymorphism data set from an African population of D. melanogaster. By using D. yakuba as an outgroup, we found clear evidence for selection on 4-fold sites along both lineages over a substantial period, with the intensity of selection increasing with GC content. Based on an explicit model of base composition evolution, we suggest that the observed AT-biased substitution pattern in both lineages is probably due to an ancestral reduction in selection intensity, and is unlikely to be the result of an increase in mutational bias towards AT alone. By using two polymorphism-based methods for estimating selection coefficients over different timescales, we show that the selection intensity on codon usage has been rather stable in D. simulans in the recent past, but the long-term estimates in D. melanogaster are much higher than the short-term ones, indicating a continuing decline in selection intensity, to such an extent that the short-term estimates suggest that selection is only active in the most GC-rich parts of the genome. Finally, we provide evidence for complex evolutionary patterns in the putatively neutral short introns, which cannot be explained by the standard GC-biased gene conversion model. These results reveal a dynamic picture of base composition evolution.
Collapse
Affiliation(s)
- Benjamin C Jackson
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| | - José L Campos
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Penelope R Haddrill
- Centre for Forensic Science, Department of Pure and Applied Chemistry, University of Strathclyde, Glasgow, United Kingdom
| | - Brian Charlesworth
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Kai Zeng
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, United Kingdom
| |
Collapse
|
8
|
Gursky VV, Kozlov KN, Kulakovskiy IV, Zubair A, Marjoram P, Lawrie DS, Nuzhdin SV, Samsonova MG. Translating natural genetic variation to gene expression in a computational model of the Drosophila gap gene regulatory network. PLoS One 2017; 12:e0184657. [PMID: 28898266 PMCID: PMC5595321 DOI: 10.1371/journal.pone.0184657] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 08/28/2017] [Indexed: 11/18/2022] Open
Abstract
Annotating the genotype-phenotype relationship, and developing a proper quantitative description of the relationship, requires understanding the impact of natural genomic variation on gene expression. We apply a sequence-level model of gap gene expression in the early development of Drosophila to analyze single nucleotide polymorphisms (SNPs) in a panel of natural sequenced D. melanogaster lines. Using a thermodynamic modeling framework, we provide both analytical and computational descriptions of how single-nucleotide variants affect gene expression. The analysis reveals that the sequence variants increase (decrease) gene expression if located within binding sites of repressors (activators). We show that the sign of SNP influence (activation or repression) may change in time and space and elucidate the origin of this change in specific examples. The thermodynamic modeling approach predicts non-local and non-linear effects arising from SNPs, and combinations of SNPs, in individual fly genotypes. Simulation of individual fly genotypes using our model reveals that this non-linearity reduces to almost additive inputs from multiple SNPs. Further, we see signatures of the action of purifying selection in the gap gene regulatory regions. To infer the specific targets of purifying selection, we analyze the patterns of polymorphism in the data at two phenotypic levels: the strengths of binding and expression. We find that combinations of SNPs show evidence of being under selective pressure, while individual SNPs do not. The model predicts that SNPs appear to accumulate in the genotypes of the natural population in a way biased towards small increases in activating action on the expression pattern. Taken together, these results provide a systems-level view of how genetic variation translates to the level of gene regulatory networks via combinatorial SNP effects.
Collapse
Affiliation(s)
- Vitaly V. Gursky
- Theoretical Department, Ioffe Institute, Saint Petersburg, Russia
- Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia
- * E-mail:
| | - Konstantin N. Kozlov
- Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia
| | - Ivan V. Kulakovskiy
- Engelhardt Institute of Molecular Biology, Moscow, Russia
- Vavilov Institute of General Genetics, Moscow, Russia
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Asif Zubair
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Paul Marjoram
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - David S. Lawrie
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Sergey V. Nuzhdin
- Molecular and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Maria G. Samsonova
- Systems Biology and Bioinformatics Laboratory, Peter the Great Saint Petersburg Polytechnic University, Saint Petersburg, Russia
| |
Collapse
|
9
|
Choi JY, Aquadro CF. Recent and Long-Term Selection Across Synonymous Sites in Drosophila ananassae. J Mol Evol 2016; 83:50-60. [PMID: 27481397 DOI: 10.1007/s00239-016-9753-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2015] [Accepted: 07/23/2016] [Indexed: 11/28/2022]
Abstract
In Drosophila, many studies have examined the short- or long-term evolution occurring across synonymous sites. Few, however, have examined both the recent and long-term evolution to gain a complete view of this selection. Here we have analyzed Drosophila ananassae DNA polymorphism and divergence data using several different methods, and have identified evidence of positive selection favoring preferred codons in both recent and long-term evolutionary time scale. Further in D. ananassae, the strength of selection for preferred codons was stronger on the X chromosome compared to the autosomes. We show that this stronger selection is not due to higher gene expression of X-linked genes. Analysis of the selectively neutral introns indicated that the X chromosome also had a preference for GC over AT nucleotides, potentially from GC-biased gene conversions (gcBGCs) that can also affect the base composition of synonymous sites. Thus selection for preferred codons and gcBGC both seem to be partially responsible for shaping the D. ananassae synonymous site evolution.
Collapse
Affiliation(s)
- Jae Young Choi
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, USA.
| | - Charles F Aquadro
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, USA
| |
Collapse
|
10
|
Testing heterogeneous base composition as potential cause for conflicting phylogenetic signal between mitochondrial and nuclear DNA in the land snail genus Theba Risso 1826 (Gastropoda: Stylommatophora: Helicoidea). ORG DIVERS EVOL 2016. [DOI: 10.1007/s13127-016-0288-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
11
|
Matsumoto T, John A, Baeza-Centurion P, Li B, Akashi H. Codon Usage Selection Can Bias Estimation of the Fraction of Adaptive Amino Acid Fixations. Mol Biol Evol 2016; 33:1580-9. [PMID: 26873577 DOI: 10.1093/molbev/msw027] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
A growing number of molecular evolutionary studies are estimating the proportion of adaptive amino acid substitutions (α) from comparisons of ratios of polymorphic and fixed DNA mutations. Here, we examine how violations of two of the model assumptions, neutral evolution of synonymous mutations and stationary base composition, affect α estimation. We simulated the evolution of coding sequences assuming weak selection on synonymous codon usage bias and neutral protein evolution, α = 0. We show that weak selection on synonymous mutations can give polymorphism/divergence ratios that yield α-hat (estimated α) considerably larger than its true value. Nonstationary evolution (changes in population size, selection, or mutation) can exacerbate such biases or, in some scenarios, give biases in the opposite direction, α-hat < α. These results demonstrate that two factors that appear to be prevalent among taxa, weak selection on synonymous mutations and non-steady-state nucleotide composition, should be considered when estimating α. Estimates of the proportion of adaptive amino acid fixations from large-scale analyses of Drosophila melanogaster polymorphism and divergence data are positively correlated with codon usage bias. Such patterns are consistent with α-hat inflation from weak selection on synonymous mutations and/or mutational changes within the examined gene trees.
Collapse
Affiliation(s)
- Tomotaka Matsumoto
- Division of Evolutionary Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan
| | - Anoop John
- Division of Evolutionary Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan
| | - Pablo Baeza-Centurion
- Division of Evolutionary Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan
| | - Boyang Li
- Division of Evolutionary Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan
| | - Hiroshi Akashi
- Division of Evolutionary Genetics, National Institute of Genetics, Yata, Mishima, Shizuoka, Japan Department of Genetics, The Graduate University for Advanced Studies (SOKENDAI), Yata, Mishima, Shizuoka, Japan
| |
Collapse
|
12
|
Vogl C, Bergman J. Inference of directional selection and mutation parameters assuming equilibrium. Theor Popul Biol 2015; 106:71-82. [PMID: 26597774 DOI: 10.1016/j.tpb.2015.10.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Revised: 09/30/2015] [Accepted: 10/07/2015] [Indexed: 01/15/2023]
Abstract
In a classical study, Wright (1931) proposed a model for the evolution of a biallelic locus under the influence of mutation, directional selection and drift. He derived the equilibrium distribution of the allelic proportion conditional on the scaled mutation rate, the mutation bias and the scaled strength of directional selection. The equilibrium distribution can be used for inference of these parameters with genome-wide datasets of "site frequency spectra" (SFS). Assuming that the scaled mutation rate is low, Wright's model can be approximated by a boundary-mutation model, where mutations are introduced into the population exclusively from sites fixed for the preferred or unpreferred allelic states. With the boundary-mutation model, inference can be partitioned: (i) the shape of the SFS distribution within the polymorphic region is determined by random drift and directional selection, but not by the mutation parameters, such that inference of the selection parameter relies exclusively on the polymorphic sites in the SFS; (ii) the mutation parameters can be inferred from the amount of polymorphic and monomorphic preferred and unpreferred alleles, conditional on the selection parameter. Herein, we derive maximum likelihood estimators for the mutation and selection parameters in equilibrium and apply the method to simulated SFS data as well as empirical data from a Madagascar population of Drosophila simulans.
Collapse
Affiliation(s)
- Claus Vogl
- Institute of Animal Breeding and Genetics, Veterinärmedizinische Universität Wien, Veterinärplatz 1, A-1210 Vienna, Austria.
| | - Juraj Bergman
- Institute of Population Genetics, Veterinärmedizinische Universität Wien, Veterinärplatz 1, A-1210 Vienna, Austria; Vienna Graduate School of Population Genetics, Veterinärmedizinische Universität Wien, Veterinärplatz 1, A-1210 Vienna, Austria.
| |
Collapse
|
13
|
Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution. Genetics 2015; 200:873-90. [PMID: 25948563 DOI: 10.1534/genetics.115.177386] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 04/28/2015] [Indexed: 01/07/2023] Open
Abstract
Inference of gene sequences in ancestral species has been widely used to test hypotheses concerning the process of molecular sequence evolution. However, the approach may produce spurious results, mainly because using the single best reconstruction while ignoring the suboptimal ones creates systematic biases. Here we implement methods to correct for such biases and use computer simulation to evaluate their performance when the substitution process is nonstationary. The methods we evaluated include parsimony and likelihood using the single best reconstruction (SBR), averaging over reconstructions weighted by the posterior probabilities (AWP), and a new method called expected Markov counting (EMC) that produces maximum-likelihood estimates of substitution counts for any branch under a nonstationary Markov model. We simulated base composition evolution on a phylogeny for six species, with different selective pressures on G+C content among lineages, and compared the counts of nucleotide substitutions recorded during simulation with the inference by different methods. We found that large systematic biases resulted from (i) the use of parsimony or likelihood with SBR, (ii) the use of a stationary model when the substitution process is nonstationary, and (iii) the use of the Hasegawa-Kishino-Yano (HKY) model, which is too simple to adequately describe the substitution process. The nonstationary general time reversible (GTR) model, used with AWP or EMC, accurately recovered the substitution counts, even in cases of complex parameter fluctuations. We discuss model complexity and the compromise between bias and variance and suggest that the new methods may be useful for studying complex patterns of nucleotide substitution in large genomic data sets.
Collapse
|
14
|
Haerty W, Ponting CP. Unexpected selection to retain high GC content and splicing enhancers within exons of multiexonic lncRNA loci. RNA (NEW YORK, N.Y.) 2015; 21:333-46. [PMID: 25589248 PMCID: PMC4338330 DOI: 10.1261/rna.047324.114] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 11/25/2014] [Indexed: 06/04/2023]
Abstract
If sequencing was possible only for genomes, and not for RNAs or proteins, then functional protein-coding exons would be recognizable by their unusual patterns of nucleotide composition, specifically a high GC content across the body of exons, and an unusual nucleotide content near their edges. RNAs and proteins can, of course, be sequenced but the extent of functionality of intergenic long noncoding RNAs (lncRNAs) remains under question owing to their low nucleotide conservation. Inspired by the nucleotide composition patterns of protein-coding exons, we sought evidence for functionality across lncRNA loci from diverse species. We found that such patterns across multiexonic lncRNA loci mirror those of proteincoding genes, although to a lesser degree: Specifically, compared with introns, lncRNA exons are GC rich. Additionally we report evidence for the action of purifying selection to preserve exonic splicing enhancers within human multiexonic lncRNAs and nucleotide composition in fruit fly lncRNAs. Our findings provide evidence for selection for more efficient rates of transcription and splicing within lncRNA loci. Despite only a minor proportion of their RNA bases being constrained, multiexonic intergenic lncRNAs appear to require accurate splicing of their exons to transact their function.
Collapse
|
15
|
Niu M, Tabari ES, Su Z. De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets. BMC Genomics 2014; 15:1047. [PMID: 25442502 PMCID: PMC4265420 DOI: 10.1186/1471-2164-15-1047] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 11/19/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task. RESULTS We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences. CONCLUSION Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/.
Collapse
Affiliation(s)
| | | | - Zhengchang Su
- Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223, USA.
| |
Collapse
|
16
|
Background selection as baseline for nucleotide variation across the Drosophila genome. PLoS Genet 2014; 10:e1004434. [PMID: 24968283 PMCID: PMC4072542 DOI: 10.1371/journal.pgen.1004434] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2013] [Accepted: 04/28/2014] [Indexed: 11/21/2022] Open
Abstract
The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS). Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms at 100-kb scale, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and, thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and selective events, future analyses should incorporate BGS predictions and capture local recombination variation across genomes and along lineages. The removal of deleterious mutations from natural populations has potential consequences on patterns of variation across genomes. Population genetic analyses, however, often assume that such effects are negligible across recombining regions of species like Drosophila. We use simple models of purifying selection and current knowledge of recombination rates and gene distribution across the genome to obtain a baseline of variation predicted by the constant input and removal of deleterious mutations. We find that purifying selection alone can explain a major fraction of the observed variance in nucleotide diversity across the genome. The use of a baseline of variation predicted by linkage to deleterious mutations as null expectation exposes genomic regions under other selective regimes, including more regions showing the signature of balancing selection than would be evident when using traditional approaches. Our study also indicates that most, if not all, nucleotides across the D. melanogaster genome are significantly influenced by the removal of deleterious mutations, even when located in the middle of highly recombining regions and distant from genes. Additionally, the study of rates of protein evolution confirms previous analyses suggesting that the recombination landscape across the genome has changed in the recent history of D. melanogaster. All these reported factors can skew current analyses designed to capture demographic events or estimate the strength and frequency of adaptive mutations, and illustrate the need for new and more realistic theoretical and modeling approaches to study naturally occurring genetic variation.
Collapse
|
17
|
Zhu A, Guo W, Jain K, Mower JP. Unprecedented Heterogeneity in the Synonymous Substitution Rate within a Plant Genome. Mol Biol Evol 2014; 31:1228-36. [DOI: 10.1093/molbev/msu079] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
18
|
Warnecke T, Becker EA, Facciotti MT, Nislow C, Lehner B. Conserved substitution patterns around nucleosome footprints in eukaryotes and Archaea derive from frequent nucleosome repositioning through evolution. PLoS Comput Biol 2013; 9:e1003373. [PMID: 24278010 PMCID: PMC3836710 DOI: 10.1371/journal.pcbi.1003373] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Accepted: 10/13/2013] [Indexed: 11/21/2022] Open
Abstract
Nucleosomes, the basic repeat units of eukaryotic chromatin, have been suggested to influence the evolution of eukaryotic genomes, both by altering the propensity of DNA to mutate and by selection acting to maintain or exclude nucleosomes in particular locations. Contrary to the popular idea that nucleosomes are unique to eukaryotes, histone proteins have also been discovered in some archaeal genomes. Archaeal nucleosomes, however, are quite unlike their eukaryotic counterparts in many respects, including their assembly into tetramers (rather than octamers) from histone proteins that lack N- and C-terminal tails. Here, we show that despite these fundamental differences the association between nucleosome footprints and sequence evolution is strikingly conserved between humans and the model archaeon Haloferax volcanii. In light of this finding we examine whether selection or mutation can explain concordant substitution patterns in the two kingdoms. Unexpectedly, we find that neither the mutation nor the selection model are sufficient to explain the observed association between nucleosomes and sequence divergence. Instead, we demonstrate that nucleosome-associated substitution patterns are more consistent with a third model where sequence divergence results in frequent repositioning of nucleosomes during evolution. Indeed, we show that nucleosome repositioning is both necessary and largely sufficient to explain the association between current nucleosome positions and biased substitution patterns. This finding highlights the importance of considering the direction of causality between genetic and epigenetic change. Genome sequences as well as epigenetic states, such as DNA methylation or nucleosome binding patterns, change during evolution. But what is the causal relationship between the two? We already know that nucleotide variation within and between species is distributed unevenly around nucleosome footprints, but does this mean that sequence evolution follows a biased course because the presence of nucleosomes affects mutation and DNA repair dynamics? Or is it, in fact, the other way around, i.e. changes happen at the DNA level and prompt shifts in nucleosome positioning? To investigate the direction of causality in genetic versus epigenetic evolution, we analyze substitutions patterns in eukaryotes as well as the archaeon Haloferax volcanii in the context of genome-wide nucleosome binding maps. We demonstrate that the relationship between nucleosome positions and between-species divergence patterns, strikingly similar in eukaryotes and archaea, can be explained in large parts by nucleosomes shifting positions in response to substitution, although both mutation and selection biases might still exist. Our results illustrate that it is important to consider the direction of causality between epigenetic and genetic change when analyzing patterns of sequence divergence and using sequence conservation to infer selection on epigenetic states.
Collapse
Affiliation(s)
- Tobias Warnecke
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG) and UPF, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- * E-mail:
| | - Erin A. Becker
- Microbiology Graduate Group, University of California, Davis, Davis, California, United States of America
| | - Marc T. Facciotti
- Microbiology Graduate Group, University of California, Davis, Davis, California, United States of America
- Department of Biomedical Engineering, University of California, Davis, Davis, California, United States of America
- Genome Center, University of California, Davis, Davis, California, United States of America
| | - Corey Nislow
- Department of Pharmaceutical Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Ben Lehner
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats, Centre for Genomic Regulation (CRG) and UPF, Barcelona, Spain
| |
Collapse
|
19
|
Cordero D, Peña JB, Saavedra C. Phylogeographic analysis of introns and mitochondrial DNA in the clam Ruditapes decussatus uncovers the effects of Pleistocene glaciations and endogenous barriers to gene flow. Mol Phylogenet Evol 2013; 71:274-87. [PMID: 24269315 DOI: 10.1016/j.ympev.2013.11.003] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2012] [Revised: 10/30/2013] [Accepted: 11/06/2013] [Indexed: 12/22/2022]
Abstract
Studies on the phylogeography of species inhabiting the Mediterranean and the nearby coasts of the NE Atlantic Ocean (MEDAT) have found subdivision and/or phylogeographic structure in one or more of the Atlantic, western Mediterranean and eastern Mediterranean basins. This structure has been explained as the result of past population fragmentation caused by Pleistocene sea level changes and current patterns of marine circulation. However, the increasing use of nuclear markers has revealed that these two factors alone are not enough to explain the phylogeographic patterns, and an additional role has been suggested for endogenous barriers to gene flow or natural selection. In this article we examined the role of these factors in Ruditapes decussatus, a commercial clam species native to MEDAT. A genetic analysis of 11 populations was carried out by examining 6 introns with a PCR-RFLP technique. We found subdivision in three regions: Atlantic (ATL), western Mediterranean plus Tunisia (WMED), and Aegean and Adriatic seas (AEGAD). Two introns (Ech and Tbp) showed alleles that were restricted to AEGAD. Sequencing a subsample of individuals for these introns indicated that AEGAD-specific alleles were separate clades, thus revealing a phylogeographic brake at the WMED-AEGAD boundary. Sequencing of the mitochondrial COI locus confirmed this phylogeographic break. Dating of the AEGAD mitochondrial haplotypes and nuclear alleles with a Bayesian MCMC method revealed that they shared common ancestors in the Pleistocene. These results can be explained in the framework of Pleistocene sea level drops and patterns of gene flow in MEDAT. An additional observation was a lack of differentiation at COI between the ATL and WMED, in sharp contrast with 4 introns that showed clear genetic subdivision. Neutrality tests did not support the hypothesis of a selective sweep acting on mtDNA to explain the contrasting levels of differentiation between mitochondrial and nuclear markers across the ATL-WMED transition, and we argue that the difference between markers is best explained by the existence of an endogenous genetic barrier, rather than by a physical barrier to larval migration alone.
Collapse
Affiliation(s)
- David Cordero
- Instituto de Acuicultura Torre de la Sal, Consejo Superior de Investigaciones Científicas, 12595 Ribera de Cabanes (Castellón), Spain
| | - Juan B Peña
- Instituto de Acuicultura Torre de la Sal, Consejo Superior de Investigaciones Científicas, 12595 Ribera de Cabanes (Castellón), Spain
| | - Carlos Saavedra
- Instituto de Acuicultura Torre de la Sal, Consejo Superior de Investigaciones Científicas, 12595 Ribera de Cabanes (Castellón), Spain.
| |
Collapse
|
20
|
Robinson MC, Stone EA, Singh ND. Population genomic analysis reveals no evidence for GC-biased gene conversion in Drosophila melanogaster. Mol Biol Evol 2013; 31:425-33. [PMID: 24214536 DOI: 10.1093/molbev/mst220] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Gene conversion is the nonreciprocal exchange of genetic material between homologous chromosomes. Multiple lines of evidence from a variety of taxa strongly suggest that gene conversion events are biased toward GC-bearing alleles. However, in Drosophila, the data have largely been indirect and unclear, with some studies supporting the predictions of a GC-biased gene conversion model and other data showing contradictory findings. Here, we test whether gene conversion events are GC-biased in Drosophila melanogaster using whole-genome polymorphism and divergence data. Our results provide no support for GC-biased gene conversion and thus suggest that this process is unlikely to significantly contribute to patterns of polymorphism and divergence in this system.
Collapse
Affiliation(s)
- Matthew C Robinson
- Department of Biological Sciences, Program in Genetics, North Carolina State University
| | | | | |
Collapse
|
21
|
Rothfels CJ, Schuettpelz E. Accelerated Rate of Molecular Evolution for Vittarioid Ferns is Strong and Not Driven by Selection. Syst Biol 2013; 63:31-54. [DOI: 10.1093/sysbio/syt058] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Carl J. Rothfels
- Department of Biology, Duke University, Box 90338, Durham, NC 27708, USA; 2Department of Zoology, University of British Columbia, #4200-6270 University Blvd., Vancouver, BC V6T 1Z4, Canada; 3Department of Biology and Marine Biology, University of North Carolina Wilmington, 601 South College Road, Wilmington, NC 28403, USA; and 4Department of Botany (MRC 166), National Museum of Natural History, Smithsonian Institution, PO Box 37012, Washington DC 20013-7012, USA
- Department of Biology, Duke University, Box 90338, Durham, NC 27708, USA; 2Department of Zoology, University of British Columbia, #4200-6270 University Blvd., Vancouver, BC V6T 1Z4, Canada; 3Department of Biology and Marine Biology, University of North Carolina Wilmington, 601 South College Road, Wilmington, NC 28403, USA; and 4Department of Botany (MRC 166), National Museum of Natural History, Smithsonian Institution, PO Box 37012, Washington DC 20013-7012, USA
| | - Eric Schuettpelz
- Department of Biology, Duke University, Box 90338, Durham, NC 27708, USA; 2Department of Zoology, University of British Columbia, #4200-6270 University Blvd., Vancouver, BC V6T 1Z4, Canada; 3Department of Biology and Marine Biology, University of North Carolina Wilmington, 601 South College Road, Wilmington, NC 28403, USA; and 4Department of Botany (MRC 166), National Museum of Natural History, Smithsonian Institution, PO Box 37012, Washington DC 20013-7012, USA
- Department of Biology, Duke University, Box 90338, Durham, NC 27708, USA; 2Department of Zoology, University of British Columbia, #4200-6270 University Blvd., Vancouver, BC V6T 1Z4, Canada; 3Department of Biology and Marine Biology, University of North Carolina Wilmington, 601 South College Road, Wilmington, NC 28403, USA; and 4Department of Botany (MRC 166), National Museum of Natural History, Smithsonian Institution, PO Box 37012, Washington DC 20013-7012, USA
| |
Collapse
|
22
|
Comparative analysis of context-dependent mutagenesis using human and mouse models. BIOMED RESEARCH INTERNATIONAL 2013; 2013:989410. [PMID: 24058920 PMCID: PMC3766559 DOI: 10.1155/2013/989410] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Accepted: 07/19/2013] [Indexed: 11/17/2022]
Abstract
Substitution rates strongly depend on their nucleotide context. One of the most studied examples is the excess of C > T mutations in the CG context in various groups of organisms, including vertebrates. Studies on the molecular mechanisms underlying this mutation regularity have provided insights into evolution, mutagenesis, and cancer development. Recently several other hypermutable motifs were identified in the human genome. There is an increased frequency of T > C mutations in the second position of the words ATTG and ATAG and an increased frequency of A > C mutations in the first position of the word ACAA. For a better understanding of evolution, it is of interest whether these mutation regularities are human specific or present in other vertebrates, as their presence might affect the validity of currently used substitution models and molecular clocks. A comprehensive analysis of mutagenesis in 4 bp mutation contexts requires a vast amount of mutation data. Such data may be derived from the comparisons of individual genomes or from single nucleotide polymorphism (SNP) databases. Using this approach, we performed a systematical comparison of mutation regularities within 2-4 bp contexts in Mus musculus and Homo sapiens and uncovered that even closely related organisms may have notable differences in context-dependent mutation regularities.
Collapse
|
23
|
Comparative analysis of context-dependent mutagenesis in humans and fruit flies. Int J Genomics 2013; 2013:173616. [PMID: 23984310 PMCID: PMC3747623 DOI: 10.1155/2013/173616] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 07/07/2013] [Indexed: 11/17/2022] Open
Abstract
In general, mutation frequencies are context-dependent: specific adjacent nucleotides may influence the probability to observe a specific type of mutation in a genome. Recently, several hypermutable motifs were identified in the human genome. Namely, there is an increased frequency of T>C mutations in the second position of the words ATTG and ATAG and an increased frequency of A>C mutations in the first position of the word ACAA. Previous studies have also shown that there is a remarkable difference between the mutagenesis of humans and drosophila. While C>T mutations are overrepresented in the CG context in humans (and other vertebrates), this mutation regularity is not observed in Drosophila melanogaster. Such differences in the observed regularities of mutagenesis between representatives of different taxa might reflect differences in the mechanisms involved in mutagenesis. We performed a systematical comparison of mutation regularities within 2-4 bp contexts in Homo sapiens and Drosophila melanogaster and found that the aforementioned contexts are not hypermutable in fruit flies. It seems that most mutation contexts affect mutation rates in a similar manner in H. sapiens and D. melanogaster; however, several important exceptions are noted and discussed.
Collapse
|
24
|
Strong purifying selection at synonymous sites in D. melanogaster. PLoS Genet 2013; 9:e1003527. [PMID: 23737754 PMCID: PMC3667748 DOI: 10.1371/journal.pgen.1003527] [Citation(s) in RCA: 143] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2013] [Accepted: 04/08/2013] [Indexed: 11/19/2022] Open
Abstract
Synonymous sites are generally assumed to be subject to weak selective constraint. For this reason, they are often neglected as a possible source of important functional variation. We use site frequency spectra from deep population sequencing data to show that, contrary to this expectation, 22% of four-fold synonymous (4D) sites in Drosophila melanogaster evolve under very strong selective constraint while few, if any, appear to be under weak constraint. Linking polymorphism with divergence data, we further find that the fraction of synonymous sites exposed to strong purifying selection is higher for those positions that show slower evolution on the Drosophila phylogeny. The function underlying the inferred strong constraint appears to be separate from splicing enhancers, nucleosome positioning, and the translational optimization generating canonical codon bias. The fraction of synonymous sites under strong constraint within a gene correlates well with gene expression, particularly in the mid-late embryo, pupae, and adult developmental stages. Genes enriched in strongly constrained synonymous sites tend to be particularly functionally important and are often involved in key developmental pathways. Given that the observed widespread constraint acting on synonymous sites is likely not limited to Drosophila, the role of synonymous sites in genetic disease and adaptation should be reevaluated.
Collapse
|
25
|
Inferences of demography and selection in an African population of Drosophila melanogaster. Genetics 2012; 193:215-28. [PMID: 23105013 DOI: 10.1534/genetics.112.145318] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
It remains a central problem in population genetics to infer the past action of natural selection, and these inferences pose a challenge because demographic events will also substantially affect patterns of polymorphism and divergence. Thus it is imperative to explicitly model the underlying demographic history of the population whenever making inferences about natural selection. In light of the considerable interest in adaptation in African populations of Drosophila melanogaster, which are considered ancestral to the species, we generated a large polymorphism data set representing 2.1 Mb from each of 20 individuals from a Ugandan population of D. melanogaster. In contrast to previous inferences of a simple population expansion in eastern Africa, our demographic modeling of this ancestral population reveals a strong signature of a population bottleneck followed by population expansion, which has significant implications for future demographic modeling of derived populations of this species. Taking this more complex underlying demographic history into account, we also estimate a mean X-linked region-wide rate of adaptation of 6 × 10(-11)/site/generation and a mean selection coefficient of beneficial mutations of 0.0009. These inferences regarding the rate and strength of selection are largely consistent with most other estimates from D. melanogaster and indicate a relatively high rate of adaptation driven by weakly beneficial mutations.
Collapse
|
26
|
Clemente F, Vogl C. Evidence for complex selection on four-fold degenerate sites in Drosophila melanogaster. J Evol Biol 2012; 25:2582-95. [PMID: 23020078 DOI: 10.1111/jeb.12003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Revised: 08/31/2012] [Accepted: 08/31/2012] [Indexed: 01/04/2023]
Abstract
We considered genome-wide four-fold degenerate sites from an African Drosophila melanogaster population and compared them to short introns. To include divergence and to polarize the data, we used its close relatives Drosophila simulans, Drosophila sechellia, Drosophila erecta and Drosophila yakuba as outgroups. In D. melanogaster, the GC content at four-fold degenerate sites is higher than in short introns; compared to its relatives, more AT than GC is fixed. The former has been explained by codon usage bias (CUB) favouring GC; the latter by decreased intensity of directional selection or by increased mutation bias towards AT. With a biallelic equilibrium model, evidence for directional selection comes mostly from the GC-rich ancestral base composition. Together with a slight mutation bias, it leads to an asymmetry of the unpolarized allele frequency spectrum, from which directional selection is inferred. Using a quasi-equilibrium model and polarized spectra, however, only purifying and no directional selection is detected. Furthermore, polarized spectra are proportional to those of the presumably unselected short introns. As we have no evidence for a decrease in effective population size, relaxed CUB must be due to a reduction in the selection coefficient. Going beyond the biallelic model and considering all four bases, signs of directional selection are stronger. In contrast to short introns, complementary bases show strand specificity and allele frequency spectra depend on mutation directions. Hence, the traditional biallelic model to describe the evolution of four-fold degenerate sites should be replaced by more complex models assuming only quasi-equilibrium and accounting for all four bases.
Collapse
Affiliation(s)
- F Clemente
- Institute of Population Genetics, Veterinärmedizinische Universität Wien, Vienna, Austria
| | | |
Collapse
|
27
|
Clemente F, Vogl C. Unconstrained evolution in short introns? - an analysis of genome-wide polymorphism and divergence data from Drosophila. J Evol Biol 2012; 25:1975-1990. [PMID: 22901008 DOI: 10.1111/j.1420-9101.2012.02580.x] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2012] [Revised: 06/15/2012] [Accepted: 06/22/2012] [Indexed: 12/23/2022]
Abstract
An unconstrained reference sequence facilitates the detection of selection. In Drosophila, sequence variation in short introns seems to be least influenced by selection and dominated by mutation and drift. Here, we test this with genome-wide sequences using an African population (Malawi) of D. melanogaster and data from the related outgroup species D. simulans, D. sechellia, D. erecta and D. yakuba. The distribution of mutations deviates from equilibrium, and the content of A and T (AT) nucleotides shows an excess of variance among introns. We explain this by a complex mutational pattern: a shift in mutational bias towards AT, leading to a slight nonequilibrium in base composition and context-dependent mutation rates, with G or C (GC) sites mutating most frequently in AT-rich introns. By comparing the corresponding allele frequency spectra of AT-rich vs. GC-rich introns, we can rule out the influence of directional selection or biased gene conversion on the mutational pattern. Compared with neutral equilibrium expectations, polymorphism spectra show an excess of low frequency and a paucity of intermediate frequency variants, irrespective of the direction of mutation. Combining the information from different outgroups with the polymorphism data and using a generalized linear model, we find evidence for shared ancestral polymorphism between D. melanogaster and D. simulans, D. sechellia, arguing against a bottleneck in D. melanogaster. Generally, we find that short introns can be used as a neutral reference on a genome-wide level, if the spatially and temporally varying mutational pattern is accounted for.
Collapse
Affiliation(s)
- F Clemente
- Institute of Population Genetics, Veterinärmedizinische Universität Wien, Vienna, Austria
| | - C Vogl
- Institute of Animal Breeding and Genetics, Veterinärmedizinische Universität Wien, Vienna, Austria
| |
Collapse
|
28
|
Ishikawa SA, Inagaki Y, Hashimoto T. RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity. Evol Bioinform Online 2012; 8:357-71. [PMID: 22798721 PMCID: PMC3394461 DOI: 10.4137/ebo.s9017] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-homogeneous' models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.
Collapse
Affiliation(s)
- Sohta A Ishikawa
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki 305-8572, Japan
| | | | | |
Collapse
|
29
|
Chachick R, Tanay A. Inferring divergence of context-dependent substitution rates in Drosophila genomes with applications to comparative genomics. Mol Biol Evol 2012; 29:1769-80. [PMID: 22319143 DOI: 10.1093/molbev/mss056] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Nucleotide substitution is a major evolutionary driving force that can incrementally and stochastically give rise to broad divergence patterns among species. The substitution process at each genomic position is frequently modeled independently of the other positions, although complex interactions between nearby bases are known to significantly affect mutation rates. Here, we study the evolution of 12 fly genomes using new algorithms for accurate inference of parameter-rich substitution models. By comparing models between lineages, we reveal the evolutionary histories of substitution rates at different flanking nucleotide contexts. We demonstrate these driving forces of molecular evolution to be constantly changing, suggesting that neutral drift of mutation rates is an important factor in the evolution of genomes and their sequence composition. This observation is used to develop a scalable approach for parameter-rich comparative genomics. By screening short DNA sequences, we demonstrate how homeoboxes and other transcription factor binding motifs are highly conserved based on our parameter-rich models but not according to standard conservation assays. With the increasing availability of genome sequences, rich substitution models become an attractive and practical approach for evolutionary analysis in general and comparative genomics in particular.
Collapse
Affiliation(s)
- Ran Chachick
- Department of Computer Science and Applied Mathematics, Weizmann Institute, Rehovot, Israel
| | | |
Collapse
|
30
|
Vogl C, Clemente F. The allele-frequency spectrum in a decoupled Moran model with mutation, drift, and directional selection, assuming small mutation rates. Theor Popul Biol 2012; 81:197-209. [PMID: 22269092 PMCID: PMC3315028 DOI: 10.1016/j.tpb.2012.01.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2011] [Revised: 12/23/2011] [Accepted: 01/04/2012] [Indexed: 01/22/2023]
Abstract
We analyze a decoupled Moran model with haploid population size N, a biallelic locus under mutation and drift with scaled forward and backward mutation rates θ1=μ1N and θ0=μ0N, and directional selection with scaled strength γ=sN. With small scaled mutation rates θ0 and θ1, which is appropriate for single nucleotide polymorphism data in highly recombining regions, we derive a simple approximate equilibrium distribution for polymorphic alleles with a constant of proportionality. We also put forth an even simpler model, where all mutations originate from monomorphic states. Using this model we derive the sojourn times, conditional on the ancestral and fixed allele, and under equilibrium the distributions of fixed and polymorphic alleles and fixation rates. Furthermore, we also derive the distribution of small samples in the diffusion limit and provide convenient recurrence relations for calculating this distribution. This enables us to give formulas analogous to the Ewens–Watterson estimator of θ for biased mutation rates and selection. We apply this theory to a polymorphism dataset of fourfold degenerate sites in Drosophila melanogaster.
Collapse
Affiliation(s)
- Claus Vogl
- Institute of Animal Breeding and Genetics, Veterinärmedizinische Universität Wien, Veterinärplatz 1, A-1210 Vienna, Austria.
| | | |
Collapse
|
31
|
Conceição IC, Long AD, Gruber JD, Beldade P. Genomic sequence around butterfly wing development genes: annotation and comparative analysis. PLoS One 2011; 6:e23778. [PMID: 21909358 PMCID: PMC3166123 DOI: 10.1371/journal.pone.0023778] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2011] [Accepted: 07/27/2011] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). CONCLUSIONS The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation.
Collapse
MESH Headings
- Alcohol Dehydrogenase/genetics
- Animals
- Base Composition/genetics
- Base Sequence
- Bombyx/genetics
- Butterflies/genetics
- Butterflies/growth & development
- Chromosomes, Artificial, Bacterial/genetics
- Computational Biology
- Conserved Sequence/genetics
- DNA Transposable Elements/genetics
- DNA, Intergenic/genetics
- Databases, Genetic
- Expressed Sequence Tags
- Gene Order/genetics
- Genes, Developmental/genetics
- Genes, Insect/genetics
- MicroRNAs/genetics
- Molecular Sequence Annotation
- Molecular Sequence Data
- Open Reading Frames/genetics
- Phylogeny
- Repetitive Sequences, Nucleic Acid/genetics
- Reproducibility of Results
- Sequence Homology, Nucleic Acid
- Synteny/genetics
- Wings, Animal/growth & development
- Wings, Animal/metabolism
Collapse
Affiliation(s)
| | - Anthony D. Long
- University of California Irvine, Irvine, California, United States of America
| | - Jonathan D. Gruber
- University of California Irvine, Irvine, California, United States of America
| | - Patrícia Beldade
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- Institute of Biology, Leiden University, Leiden, The Netherlands
| |
Collapse
|
32
|
Huang H, He Q, Kubatko LS, Knowles LL. Sources of Error Inherent in Species-Tree Estimation: Impact of Mutational and Coalescent Effects on Accuracy and Implications for Choosing among Different Methods. Syst Biol 2010; 59:573-83. [DOI: 10.1093/sysbio/syq047] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Huateng Huang
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA
| | - Qixin He
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA
| | - Laura S. Kubatko
- Department of Statistics
- Department of Evolution, Ecology, and Organismal Biology, Ohio State University, Columbus, OH 43210, USA
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, Museum of Zoology, University of Michigan, Ann Arbor, MI 48109-1079, USA
| |
Collapse
|
33
|
Clark NL, Aquadro CF. A novel method to detect proteins evolving at correlated rates: identifying new functional relationships between coevolving proteins. Mol Biol Evol 2009; 27:1152-61. [PMID: 20044587 DOI: 10.1093/molbev/msp324] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Interacting proteins evolve at correlated rates, possibly as the result of evolutionary pressures shared by functional groups and/or coevolution between interacting proteins. This evolutionary signature can be exploited to learn more about protein networks and to infer functional relationships between proteins on a genome-wide scale. Multiple methods have been introduced that detect correlated evolution using amino acid distances. One assumption made by these methods is that the neutral rate of nucleotide substitution is uniform over time; however, this is unlikely and such rate heterogeneity would adversely affect amino acid distance methods. We explored alternative methods that detect correlated rates using protein-coding nucleotide sequences in order to better estimate the rate of nonsynonymous substitution at each branch (d(N)) normalized by the underlying synonymous substitution rate (d(S)). Our novel likelihood method, which was robust to realistic simulation parameters, was tested on Drosophila nuclear pore proteins, which form a complex with well-documented physical interactions. The method revealed significantly correlated evolution between nuclear pore proteins, where members of a stable subcomplex showed stronger correlations compared with those proteins that interact transiently. Furthermore, our likelihood approach was better able to detect correlated evolution among closely related species than previous methods. Hence, these sequence-based methods are a complementary approach for detecting correlated evolution and could be applied genome-wide to provide candidate protein-protein interactions and functional group assignments using just coding sequences.
Collapse
Affiliation(s)
- Nathaniel L Clark
- Department of Molecular Biology and Genetics, Cornell University, USA.
| | | |
Collapse
|
34
|
Singh ND, Larracuente AM, Sackton TB, Clark AG. Comparative Genomics on the Drosophila Phylogenetic Tree. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2009. [DOI: 10.1146/annurev.ecolsys.110308.120214] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the sequencing of 12 complete euchromatic Drosophila genomes, the genus Drosophila is a leading model for comparative genomics. In this review, we discuss the novel insights into evolutionary processes afforded by the newly available genomic sequences when placed in the context of the phylogeny. We focus on three levels: insights into whole-genome content, such as changes in genome size and content across the phylogeny; insights into large-scale patterns of divergence and conservation, such as selective constraints on genes and chromosome-level evolution of sex chromosomes; and insights into finer-scale processes in individual lineages and genes, such as lineage-specific evolution in response to ecological context. As the field of comparative genomics is still young, we also discuss current challenges, such as the development of more sophisticated evolutionary models to capture nonequilibrium processes and the improvement of assembly and alignment algorithms to better capture uncertainty in the data.
Collapse
Affiliation(s)
- Nadia D. Singh
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853
| | - Amanda M. Larracuente
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853
| | - Timothy B. Sackton
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138
| | - Andrew G. Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853
| |
Collapse
|