1
|
Joseph J. Increased Positive Selection in Highly Recombining Genes Does not Necessarily Reflect an Evolutionary Advantage of Recombination. Mol Biol Evol 2024; 41:msae107. [PMID: 38829800 PMCID: PMC11173204 DOI: 10.1093/molbev/msae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/08/2024] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
It is commonly thought that the long-term advantage of meiotic recombination is to dissipate genetic linkage, allowing natural selection to act independently on different loci. It is thus theoretically expected that genes with higher recombination rates evolve under more effective selection. On the other hand, recombination is often associated with GC-biased gene conversion (gBGC), which theoretically interferes with selection by promoting the fixation of deleterious GC alleles. To test these predictions, several studies assessed whether selection was more effective in highly recombining genes (due to dissipation of genetic linkage) or less effective (due to gBGC), assuming a fixed distribution of fitness effects (DFE) for all genes. In this study, I directly derive the DFE from a gene's evolutionary history (shaped by mutation, selection, drift, and gBGC) under empirical fitness landscapes. I show that genes that have experienced high levels of gBGC are less fit and thus have more opportunities for beneficial mutations. Only a small decrease in the genome-wide intensity of gBGC leads to the fixation of these beneficial mutations, particularly in highly recombining genes. This results in increased positive selection in highly recombining genes that is not caused by more effective selection. Additionally, I show that the death of a recombination hotspot can lead to a higher dN/dS than its birth, but with substitution patterns biased towards AT, and only at selected positions. This shows that controlling for a substitution bias towards GC is therefore not sufficient to rule out the contribution of gBGC to signatures of accelerated evolution. Finally, although gBGC does not affect the fixation probability of GC-conservative mutations, I show that by altering the DFE, gBGC can also significantly affect nonsynonymous GC-conservative substitution patterns.
Collapse
Affiliation(s)
- Julien Joseph
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, CNRS, UMR 5558, Villeurbanne, France
| |
Collapse
|
2
|
Schwaha T, Decker SH, Baranyi C, Saadi AJ. Rediscovering the unusual, solitary bryozoan Monobryozoon ambulans Remane, 1936: first molecular and new morphological data clarify its phylogenetic position. Front Zool 2024; 21:5. [PMID: 38443908 PMCID: PMC10913646 DOI: 10.1186/s12983-024-00527-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 02/26/2024] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND One of the most peculiar groups of the mostly colonial phylum Bryozoa is the taxon Monobryozoon, whose name already implies non-colonial members of the phylum. Its peculiarity and highly unusual lifestyle as a meiobenthic clade living on sand grains has fascinated many biologists. In particular its systematic relationship to other bryozoans remains a mystery. Despite numerous searches for M. ambulans in its type locality Helgoland, a locality with a long-lasting marine station and tradition of numerous courses and workshops, it has never been reencountered until today. Here we report the first observations of this almost mythical species, Monobryozoon ambulans. RESULTS For the first time since 1938, we present new modern, morphological analyses of this species as well as the first ever molecular data. Our detailed morphological analysis confirms most previous descriptions, but also ascertains the presence of special ambulatory polymorphic zooids. We consider these as bud anlagen that ultimately consecutively separate from the animal rendering it pseudo-colonial. The remaining morphological data show strong ties to alcyonidioidean ctenostome bryozoans. Our morphological data is in accordance with the phylogenomic analysis, which clusters it with species of Alcyonidium as a sister group to multiporate ctenostomes. Divergence time estimation and ancestral state reconstruction recover the solitary state of M. ambulans as a derived character that probably evolved in the Late Cretaceous. In this study, we also provide the entire mitogenome of M. ambulans, which-despite the momentary lack of comparable data-provides important data of a unique and rare species for comparative aspects in the future. CONCLUSIONS We were able to provide first sequence data and modern morphological data for the unique bryozoan, M. ambulans, which are both supporting an alcyonidioidean relationship within ctenostome bryozoans.
Collapse
Affiliation(s)
- Thomas Schwaha
- Department of Evolutionary Biology, University of Vienna, Schlachthausgasse 43, 1030, Vienna, Austria.
| | - Sebastian H Decker
- Department of Evolutionary Biology, University of Vienna, Schlachthausgasse 43, 1030, Vienna, Austria
| | - Christian Baranyi
- Department of Evolutionary Biology, University of Vienna, Schlachthausgasse 43, 1030, Vienna, Austria
| | - Ahmed J Saadi
- Department of Evolutionary Biology, University of Vienna, Schlachthausgasse 43, 1030, Vienna, Austria
| |
Collapse
|
3
|
Marlétaz F, Timoshevskaya N, Timoshevskiy VA, Parey E, Simakov O, Gavriouchkina D, Suzuki M, Kubokawa K, Brenner S, Smith JJ, Rokhsar DS. The hagfish genome and the evolution of vertebrates. Nature 2024; 627:811-820. [PMID: 38262590 PMCID: PMC10972751 DOI: 10.1038/s41586-024-07070-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 01/15/2024] [Indexed: 01/25/2024]
Abstract
As the only surviving lineages of jawless fishes, hagfishes and lampreys provide a crucial window into early vertebrate evolution1-3. Here we investigate the complex history, timing and functional role of genome-wide duplications4-7 and programmed DNA elimination8,9 in vertebrates in the light of a chromosome-scale genome sequence for the brown hagfish Eptatretus atami. Combining evidence from syntenic and phylogenetic analyses, we establish a comprehensive picture of vertebrate genome evolution, including an auto-tetraploidization (1RV) that predates the early Cambrian cyclostome-gnathostome split, followed by a mid-late Cambrian allo-tetraploidization (2RJV) in gnathostomes and a prolonged Cambrian-Ordovician hexaploidization (2RCY) in cyclostomes. Subsequently, hagfishes underwent extensive genomic changes, with chromosomal fusions accompanied by the loss of genes that are essential for organ systems (for example, genes involved in the development of eyes and in the proliferation of osteoclasts); these changes account, in part, for the simplification of the hagfish body plan1,2. Finally, we characterize programmed DNA elimination in hagfish, identifying protein-coding genes and repetitive elements that are deleted from somatic cell lineages during early development. The elimination of these germline-specific genes provides a mechanism for resolving genetic conflict between soma and germline by repressing germline and pluripotency functions, paralleling findings in lampreys10,11. Reconstruction of the early genomic history of vertebrates provides a framework for further investigations of the evolution of cyclostomes and jawed vertebrates.
Collapse
Affiliation(s)
- Ferdinand Marlétaz
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK.
- Molecular Genetics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.
| | | | | | - Elise Parey
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Oleg Simakov
- Molecular Genetics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
- Department for Neurosciences and Developmental Biology, University of Vienna, Vienna, Austria
| | - Daria Gavriouchkina
- Molecular Genetics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
- UK Dementia Research Institute, University College London, London, UK
| | - Masakazu Suzuki
- Department of Science, Graduate School of Integrated Science and Technology, Shizuoka University, Shizuoka, Japan
| | - Kaoru Kubokawa
- Ocean Research Institute, The University of Tokyo, Tokyo, Japan
| | - Sydney Brenner
- Comparative and Medical Genomics Laboratory, Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore, Singapore
| | - Jeramiah J Smith
- Department of Biology, University of Kentucky, Lexington, KY, USA.
| | - Daniel S Rokhsar
- Molecular Genetics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan.
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
4
|
Marlétaz F, Timoshevskaya N, Timoshevskiy V, Simakov O, Parey E, Gavriouchkina D, Suzuki M, Kubokawa K, Brenner S, Smith J, Rokhsar DS. The hagfish genome and the evolution of vertebrates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.17.537254. [PMID: 37131617 PMCID: PMC10153176 DOI: 10.1101/2023.04.17.537254] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
As the only surviving lineages of jawless fishes, hagfishes and lampreys provide a critical window into early vertebrate evolution. Here, we investigate the complex history, timing, and functional role of genome-wide duplications in vertebrates in the light of a chromosome-scale genome of the brown hagfish Eptatretus atami. Using robust chromosome-scale (paralogon-based) phylogenetic methods, we confirm the monophyly of cyclostomes, document an auto-tetraploidization (1RV) that predated the origin of crown group vertebrates ~517 Mya, and establish the timing of subsequent independent duplications in the gnathostome and cyclostome lineages. Some 1RV gene duplications can be linked to key vertebrate innovations, suggesting that this early genomewide event contributed to the emergence of pan-vertebrate features such as neural crest. The hagfish karyotype is derived by numerous fusions relative to the ancestral cyclostome arrangement preserved by lampreys. These genomic changes were accompanied by the loss of genes essential for organ systems (eyes, osteoclast) that are absent in hagfish, accounting in part for the simplification of the hagfish body plan; other gene family expansions account for hagfishes' capacity to produce slime. Finally, we characterise programmed DNA elimination in somatic cells of hagfish, identifying protein-coding and repetitive elements that are deleted during development. As in lampreys, the elimination of these genes provides a mechanism for resolving genetic conflict between soma and germline by repressing germline/pluripotency functions. Reconstruction of the early genomic history of vertebrates provides a framework for further exploration of vertebrate novelties.
Collapse
Affiliation(s)
- Ferdinand Marlétaz
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
- Molecular Genetics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | | | | | - Oleg Simakov
- Molecular Genetics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
- Department of Molecular Evolution and Development, University of Vienna, Vienna, Austria
| | - Elise Parey
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London, UK
| | - Daria Gavriouchkina
- Molecular Genetics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
- Present address: UK Dementia Research Institute, University College London, London, UK
| | - Masakazu Suzuki
- Department of Science, Graduate School of Integrated Science and Technology, Shizuoka University, Shizuoka, Japan
| | - Kaoru Kubokawa
- Ocean Research Institute, The University of Tokyo, Tokyo, Japan
| | - Sydney Brenner
- Comparative and Medical Genomics Laboratory, Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore 138673, Singapore
- Deceased
| | - Jeramiah Smith
- Department of Biology, University of Kentucky, Lexington, KY, USA
| | - Daniel S Rokhsar
- Molecular Genetics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| |
Collapse
|
5
|
A Compositional Heterogeneity Analysis of Mitochondrial Phylogenomics in Chalcidoidea Involving Two Newly Sequenced Mitogenomes of Eupelminae (Hymenoptera: Chalcidoidea). Genes (Basel) 2022; 13:genes13122340. [PMID: 36553606 PMCID: PMC9778353 DOI: 10.3390/genes13122340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 12/07/2022] [Accepted: 12/09/2022] [Indexed: 12/14/2022] Open
Abstract
As next-generation sequencing technology becomes more mature and the cost of sequencing continues to fall, researchers are increasingly using mitochondrial genomes to explore phylogenetic relationships among different groups. In this study, we sequenced and analyzed the complete mitochondrial genomes of Eupelmus anpingensis and Merostenus sp. We predicted the secondary-structure tRNA genes of these two species and found that 21 of the 22 tRNA genes in Merostenus sp. exhibited typical clover-leaf structures, with trnS1 being the lone exception. In E. anpingensis, we found that, in addition to trnS1, the secondary structure of trnE was also incomplete, with only DHU arms and anticodon loop remaining. In addition, we found that compositional heterogeneity and variable rates of evolution are prevalent in Chalcidoidea. Under the homogeneity model, a Eupelmidae + Encyrtidae sister group relationship was proposed. Different datasets based on the heterogeneity model produced different tree topologies, but all tree topologies contained Chalcididae and Trichogrammatidae in the basal position of the tree. This is the first study to consider the phylogenetic relationships of Chalcidoidea by comparing a heterogeneity model with a homogeneity model.
Collapse
|
6
|
Dong W, Li E, Liu Y, Xu C, Wang Y, Liu K, Cui X, Sun J, Suo Z, Zhang Z, Wen J, Zhou S. Phylogenomic approaches untangle early divergences and complex diversifications of the olive plant family. BMC Biol 2022; 20:92. [PMID: 35468824 PMCID: PMC9040247 DOI: 10.1186/s12915-022-01297-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Accepted: 04/13/2022] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Deep-branching phylogenetic relationships are often difficult to resolve because phylogenetic signals are obscured by the long history and complexity of evolutionary processes, such as ancient introgression/hybridization, polyploidization, and incomplete lineage sorting (ILS). Phylogenomics has been effective in providing information for resolving both deep- and shallow-scale relationships across all branches of the tree of life. The olive family (Oleaceae) is composed of 25 genera classified into five tribes with tribe Oleeae consisting of four subtribes. Previous phylogenetic analyses showed that ILS and/or hybridization led to phylogenetic incongruence in the family. It was essential to distinguish phylogenetic signal conflicts, and explore mechanisms for the uncertainties concerning relationships of the olive family, especially at the deep-branching nodes. RESULTS We used the whole plastid genome and nuclear single nucleotide polymorphism (SNP) data to infer the phylogenetic relationships and to assess the variation and rates among the main clades of the olive family. We also used 2608 and 1865 orthologous nuclear genes to infer the deep-branching relationships among tribes of Oleaceae and subtribes of tribe Oleeae, respectively. Concatenated and coalescence trees based on the plastid genome, nuclear SNPs and multiple nuclear genes suggest events of ILS and/or ancient introgression during the diversification of Oleaceae. Additionally, there was extreme heterogeneity in the substitution rates across the tribes. Furthermore, our results supported that introgression/hybridization, rather than ILS, is the main factor for phylogenetic discordance among the five tribes of Oleaceae. The tribe Oleeae is supported to have originated via ancient hybridization and polyploidy, and its most likely parentages are the ancestral lineage of Jasmineae or its sister group, which is a "ghost lineage," and Forsythieae. However, ILS and ancient introgression are mainly responsible for the phylogenetic discordance among the four subtribes of tribe Oleeae. CONCLUSIONS This study showcases that using multiple sequence datasets (plastid genomes, nuclear SNPs and thousands of nuclear genes) and diverse phylogenomic methods such as data partition, heterogeneous models, quantifying introgression via branch lengths (QuIBL) analysis, and species network analysis can facilitate untangling long and complex evolutionary processes of ancient introgression, paleopolyploidization, and ILS.
Collapse
Affiliation(s)
- Wenpan Dong
- Laboratory of Systematic Evolution and Biogeography of Woody Plants, School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083, China.
| | - Enze Li
- Laboratory of Systematic Evolution and Biogeography of Woody Plants, School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083, China
| | - Yanlei Liu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Chao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Yushuang Wang
- Laboratory of Systematic Evolution and Biogeography of Woody Plants, School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083, China
| | - Kangjia Liu
- Laboratory of Systematic Evolution and Biogeography of Woody Plants, School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083, China
| | - Xingyong Cui
- Laboratory of Systematic Evolution and Biogeography of Woody Plants, School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083, China
| | - Jiahui Sun
- State Key Laboratory Breeding Base of Dao-di Herbs, National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China.
| | - Zhili Suo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Zhixiang Zhang
- Laboratory of Systematic Evolution and Biogeography of Woody Plants, School of Ecology and Nature Conservation, Beijing Forestry University, Beijing, 100083, China
| | - Jun Wen
- Department of Botany, National Museum of Natural History, Smithsonian Institution, Washington, DC, 20013-7012, USA.
| | - Shiliang Zhou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| |
Collapse
|
7
|
Shang Y, Ren L, Zhang X, Li Y, Zhang C, Guo Y. Characterization and Comparative Analysis of Mitochondrial Genomes Among the Calliphoridae (Insecta: Diptera: Oestroidea) and Phylogenetic Implications. Front Genet 2022; 13:799203. [PMID: 35251125 PMCID: PMC8891575 DOI: 10.3389/fgene.2022.799203] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/27/2022] [Indexed: 11/21/2022] Open
Abstract
The Calliphoridae (blowflies) are significant for forensic science, veterinary management, medical science, and economic issues. However, the phylogenetic relationships within this family are poorly understood and controversial, and the status of the Calliphoridae has been a crucial problem for understanding the evolutionary relationships of the Oestroidea these years. In the present study, seven mitochondrial genomes (mitogenomes), including six calliphorid species and one Polleniidae species, were sequenced and annotated. Then a comparative mitochondrial genomic analysis among the Calliphoridae is presented. Additionally, the phylogenetic relationship of the Calliphoridae within the larger context of the other Oestroidea was reconstructed based on the mitogenomic datasets using maximum likelihood (ML) and Bayesian methods (BI). The results suggest that the gene arrangement, codon usage, and base composition are conserved within the calliphorid species. The phylogenetic analysis based on the mitogenomic dataset recovered the Calliphoridae as monophyletic and inferred the following topology within Oestroidea: (Oestridae (Sarcophagidae (Calliphoridae + (Polleniidae + (Mesembrinellidae + Tachinidae))))). Although the number of exemplar species is limited, further studies are required. Within the Calliphoridae, the Chrysomyinae were recovered as sister taxon to Luciliinae + Calliphorinae. Our analyses indicated that mitogenomic data have the potential for illuminating the phylogenetic relationships in the Oestroidea as well as for the classification of the Calliphoridae.
Collapse
Affiliation(s)
| | | | | | | | | | - Yadong Guo
- *Correspondence: Changquan Zhang, ; Yadong Guo,
| |
Collapse
|
8
|
Hugoson E, Guliaev A, Ammunét T, Guy L. Host-adaptation in Legionellales is 1.9 Ga, coincident with eukaryogenesis. Mol Biol Evol 2022; 39:6527638. [PMID: 35167692 PMCID: PMC8896642 DOI: 10.1093/molbev/msac037] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Bacteria adapting to living in a host cell caused the most salient events in the evolution of eukaryotes, namely the seminal fusion with an archaeon, and the emergence of both mitochondrion and chloroplast. A bacterial clade that may hold the key to understanding these events is the deep-branching gammaproteobacterial order Legionellales-containing among others Coxiella and Legionella-of which all known members grow inside eukaryotic cells. Here, by analyzing 35 novel Legionellales genomes mainly acquired through metagenomics, we show that this group is much more diverse than previously thought, and that key host-adaptation events took place very early in its evolution. Crucial virulence factors like the Type IVB secretion (Dot/Icm) system and two shared effector proteins were gained in the last Legionellales common ancestor (LLCA). Many metabolic gene families were lost in LLCA and its immediate descendants, including functions directly and indirectly related to molybdenum metabolism. On the other hand, genome sizes increased in the ancestors of the Legionella genus. We estimate that LLCA lived circa 1.89 Ga ago, probably predating the last eukaryotic common ancestor (LECA) by circa 0.4-1.0 Ga. These elements strongly indicate that host-adaptation arose only once in Legionellales, and that these bacteria were using advanced molecular machinery to exploit and manipulate host cells early in eukaryogenesis.
Collapse
Affiliation(s)
- Eric Hugoson
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratories, Uppsala University, Box 582, 75123, Uppsala, Sweden.,Department of Microbial Population Biology, Max Planck Institute for Evolutionary Biology, Plön, D-24306, Germany
| | - Andrei Guliaev
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratories, Uppsala University, Box 582, 75123, Uppsala, Sweden
| | - Tea Ammunét
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratories, Uppsala University, Box 582, 75123, Uppsala, Sweden
| | - Lionel Guy
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratories, Uppsala University, Box 582, 75123, Uppsala, Sweden
| |
Collapse
|
9
|
Chen J, Zhang Y, Shen B. Bioinformatics for the Origin and Evolution of Viruses. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1368:53-71. [DOI: 10.1007/978-981-16-8969-7_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
10
|
Stark TL, Liberles DA. Characterizing Amino Acid Substitution with Complete Linkage of Sites on a Lineage. Genome Biol Evol 2021; 13:6377338. [PMID: 34581792 PMCID: PMC8557849 DOI: 10.1093/gbe/evab225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/17/2021] [Indexed: 11/16/2022] Open
Abstract
Amino acid substitution models are commonly used for phylogenetic inference, for ancestral sequence reconstruction, and for the inference of positive selection. All commonly used models explicitly assume that each site evolves independently, an assumption that is violated by both linkage and protein structural and functional constraints. We introduce two new models for amino acid substitution which incorporate linkage between sites, each based on the (population-genetic) Moran model. The first model is a generalized population process tracking arbitrarily many sites which undergo mutation, with individuals replaced according to their fitnesses. This model provides a reasonably complete framework for simulations but is numerically and analytically intractable. We also introduce a second model which includes several simplifying assumptions but for which some theoretical results can be derived. We analyze the simplified model to determine conditions where linkage is likely to have meaningful effects on sitewise substitution probabilities, as well as conditions under which the effects are likely to be negligible. These findings are an important step in the generation of tractable phylogenetic models that parameterize selective coefficients for amino acid substitution while accounting for linkage of sites leading to both hitchhiking and background selection.
Collapse
Affiliation(s)
- Tristan L Stark
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, USA
| |
Collapse
|
11
|
Latrille T, Lanore V, Lartillot N. Inferring long-term effective population size with Mutation-Selection Models. Mol Biol Evol 2021; 38:4573-4587. [PMID: 34191010 PMCID: PMC8476147 DOI: 10.1093/molbev/msab160] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Mutation–selection phylogenetic codon models are grounded on population genetics first principles and represent a principled approach for investigating the intricate interplay between mutation, selection, and drift. In their current form, mutation–selection codon models are entirely characterized by the collection of site-specific amino-acid fitness profiles. However, thus far, they have relied on the assumption of a constant genetic drift, translating into a unique effective population size (Ne) across the phylogeny, clearly an unrealistic assumption. This assumption can be alleviated by introducing variation in Ne between lineages. In addition to Ne, the mutation rate (μ) is susceptible to vary between lineages, and both should covary with life-history traits (LHTs). This suggests that the model should more globally account for the joint evolutionary process followed by all of these lineage-specific variables (Ne, μ, and LHTs). In this direction, we introduce an extended mutation–selection model jointly reconstructing in a Bayesian Monte Carlo framework the fitness landscape across sites and long-term trends in Ne, μ, and LHTs along the phylogeny, from an alignment of DNA coding sequences and a matrix of observed LHTs in extant species. The model was tested against simulated data and applied to empirical data in mammals, isopods, and primates. The reconstructed history of Ne in these groups appears to correlate with LHTs or ecological variables in a way that suggests that the reconstruction is reasonable, at least in its global trends. On the other hand, the range of variation in Ne inferred across species is surprisingly narrow. This last point suggests that some of the assumptions of the model, in particular concerning the assumed absence of epistatic interactions between sites, are potentially problematic.
Collapse
Affiliation(s)
- T Latrille
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, 5558, F-69622, Villeurbanne, France.,École Normale Supérieure de Lyon, Université de Lyon, Université Lyon 1, Lyon, France,
| | - V Lanore
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, 5558, F-69622, Villeurbanne, France
| | - N Lartillot
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR, 5558, F-69622, Villeurbanne, France
| |
Collapse
|
12
|
Rodrigue N, Latrille T, Lartillot N. A Bayesian Mutation-Selection Framework for Detecting Site-Specific Adaptive Evolution in Protein-Coding Genes. Mol Biol Evol 2021; 38:1199-1208. [PMID: 33045094 PMCID: PMC7947879 DOI: 10.1093/molbev/msaa265] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
In recent years, codon substitution models based on the mutation–selection principle have been extended for the purpose of detecting signatures of adaptive evolution in protein-coding genes. However, the approaches used to date have either focused on detecting global signals of adaptive regimes—across the entire gene—or on contexts where experimentally derived, site-specific amino acid fitness profiles are available. Here, we present a Bayesian site-heterogeneous mutation–selection framework for site-specific detection of adaptive substitution regimes given a protein-coding DNA alignment. We offer implementations, briefly present simulation results, and apply the approach on a few real data sets. Our analyses suggest that the new approach shows greater sensitivity than traditional methods. However, more study is required to assess the impact of potential model violations on the method, and gain a greater empirical sense its behavior on a broader range of real data sets. We propose an outline of such a research program.
Collapse
Affiliation(s)
- Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, Ottawa, Canada
| | - Thibault Latrille
- Université de Lyon, Université Lyon 1, CNRS; UMR 5558, Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, F-69622, France
| | - Nicolas Lartillot
- Université de Lyon, Université Lyon 1, CNRS; UMR 5558, Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, F-69622, France
| |
Collapse
|
13
|
Ritchie AM, Stark TL, Liberles DA. Inferring the number and position of changes in selective regime in a non-equilibrium mutation-selection framework. BMC Ecol Evol 2021; 21:39. [PMID: 33691618 PMCID: PMC7944921 DOI: 10.1186/s12862-021-01770-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 02/25/2021] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Recovering the historical patterns of selection acting on a protein coding sequence is a major goal of evolutionary biology. Mutation-selection models address this problem by explicitly modelling fixation rates as a function of site-specific amino acid fitness values.However, they are restricted in their utility for investigating directional evolution because they require prior knowledge of the locations of fitness changes in the lineages of a phylogeny. RESULTS We apply a modified mutation-selection methodology that relaxes assumptions of equlibrium and time-reversibility. Our implementation allows us to identify branches where adaptive or compensatory shifts in the fitness landscape have taken place, signalled by a change in amino acid fitness profiles. Through simulation and analysis of an empirical data set of [Formula: see text]-lactamase genes, we test our ability to recover the position of adaptive events within the tree and successfully reconstruct initial codon frequencies and fitness profile parameters generated under the non-stationary model. CONCLUSION We demonstrate successful detection of selective shifts and identification of the affected branch on partitions of 300 codons or more. We successfully reconstruct fitness parameters and initial codon frequencies in simulated data and demonstrate that failing to account for non-equilibrium evolution can increase the error in fitness profile estimation. We also demonstrate reconstruction of plausible shifts in amino acid fitnesses in the bacterial [Formula: see text]-lactamase family and discuss some caveats for interpretation.
Collapse
Affiliation(s)
- Andrew M Ritchie
- Department of Biology, Temple University, 1900 North 12th Street, Philadelphia, PA, USA
| | - Tristan L Stark
- Department of Biology, Temple University, 1900 North 12th Street, Philadelphia, PA, USA
| | - David A Liberles
- Department of Biology, Temple University, 1900 North 12th Street, Philadelphia, PA, USA.
| |
Collapse
|
14
|
Del Amparo R, Branco C, Arenas J, Vicens A, Arenas M. Analysis of selection in protein-coding sequences accounting for common biases. Brief Bioinform 2021; 22:6105943. [PMID: 33479739 DOI: 10.1093/bib/bbaa431] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 12/17/2020] [Accepted: 12/22/2020] [Indexed: 12/16/2022] Open
Abstract
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
Collapse
Affiliation(s)
- Roberto Del Amparo
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Catarina Branco
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Jesús Arenas
- Unit of Microbiology and Immunology, University of Zaragoza, 50013 Zaragoza, Spain
| | - Alberto Vicens
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.,Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
15
|
Johnson MM, Wilke CO. Site-Specific Amino Acid Distributions Follow a Universal Shape. J Mol Evol 2020; 88:731-741. [PMID: 33230664 PMCID: PMC7717668 DOI: 10.1007/s00239-020-09976-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/17/2020] [Indexed: 11/25/2022]
Abstract
In many applications of evolutionary inference, a model of protein evolution needs to be fitted to the amino acid variation at individual sites in a multiple sequence alignment. Most existing models fall into one of two extremes: Either they provide a coarse-grained description that lacks biophysical realism (e.g., dN/dS models), or they require a large number of parameters to be fitted (e.g., mutation-selection models). Here, we ask whether a middle ground is possible: Can we obtain a realistic description of site-specific amino acid frequencies while severely restricting the number of free parameters in the model? We show that a distribution with a single free parameter can accurately capture the variation in amino acid frequency at most sites in an alignment, as long as we are willing to restrict our analysis to predicting amino acid frequencies by rank rather than by amino acid identity. This result holds equally well both in alignments of empirical protein sequences and of sequences evolved under a biophysically realistic all-atom force field. Our analysis reveals a near universal shape of the frequency distributions of amino acids. This insight has the potential to lead to new models of evolution that have both increased realism and a limited number of free parameters.
Collapse
Affiliation(s)
- Mackenzie M Johnson
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
16
|
Peijnenburg KTCA, Janssen AW, Wall-Palmer D, Goetze E, Maas AE, Todd JA, Marlétaz F. The origin and diversification of pteropods precede past perturbations in the Earth's carbon cycle. Proc Natl Acad Sci U S A 2020; 117:25609-25617. [PMID: 32973093 PMCID: PMC7568333 DOI: 10.1073/pnas.1920918117] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Pteropods are a group of planktonic gastropods that are widely regarded as biological indicators for assessing the impacts of ocean acidification. Their aragonitic shells are highly sensitive to acute changes in ocean chemistry. However, to gain insight into their potential to adapt to current climate change, we need to accurately reconstruct their evolutionary history and assess their responses to past changes in the Earth's carbon cycle. Here, we resolve the phylogeny and timing of pteropod evolution with a phylogenomic dataset (2,654 genes) incorporating new data for 21 pteropod species and revised fossil evidence. In agreement with traditional taxonomy, we recovered molecular support for a division between "sea butterflies" (Thecosomata; mucus-web feeders) and "sea angels" (Gymnosomata; active predators). Molecular dating demonstrated that these two lineages diverged in the early Cretaceous, and that all main pteropod clades, including shelled, partially-shelled, and unshelled groups, diverged in the mid- to late Cretaceous. Hence, these clades originated prior to and subsequently survived major global change events, including the Paleocene-Eocene Thermal Maximum (PETM), the closest analog to modern-day ocean acidification and warming. Our findings indicate that planktonic aragonitic calcifiers have shown resilience to perturbations in the Earth's carbon cycle over evolutionary timescales.
Collapse
Affiliation(s)
- Katja T C A Peijnenburg
- Plankton Diversity and Evolution, Naturalis Biodiversity Center, 2300 RA Leiden, The Netherlands;
- Department Freshwater and Marine Ecology, Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, 1090 GE Amsterdam, The Netherlands
| | - Arie W Janssen
- Plankton Diversity and Evolution, Naturalis Biodiversity Center, 2300 RA Leiden, The Netherlands
| | - Deborah Wall-Palmer
- Plankton Diversity and Evolution, Naturalis Biodiversity Center, 2300 RA Leiden, The Netherlands
| | - Erica Goetze
- Department of Oceanography, University of Hawai'i at Mānoa, Honolulu, HI 96822
| | - Amy E Maas
- Bermuda Institute of Ocean Sciences, St. Georges GE01, Bermuda
| | - Jonathan A Todd
- Department of Earth Sciences, Natural History Museum, London SW7 5BD, United Kingdom
| | - Ferdinand Marlétaz
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, United Kingdom;
- Molecular Genetics Unit, Okinawa Institute of Science and Technology, Onna-son 904-0495, Japan
| |
Collapse
|
17
|
Youssef N, Susko E, Bielawski JP. Consequences of Stability-Induced Epistasis for Substitution Rates. Mol Biol Evol 2020; 37:3131-3148. [DOI: 10.1093/molbev/msaa151] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
AbstractDo interactions between residues in a protein (i.e., epistasis) significantly alter evolutionary dynamics? If so, what consequences might they have on inference from traditional codon substitution models which assume site-independence for the sake of computational tractability? To investigate the effects of epistasis on substitution rates, we employed a mechanistic mutation-selection model in conjunction with a fitness framework derived from protein stability. We refer to this as the stability-informed site-dependent (S-SD) model and developed a new stability-informed site-independent (S-SI) model that captures the average effect of stability constraints on individual sites of a protein. Comparison of S-SI and S-SD offers a novel and direct method for investigating the consequences of stability-induced epistasis on protein evolution. We developed S-SI and S-SD models for three natural proteins and showed that they generate sequences consistent with real alignments. Our analyses revealed that epistasis tends to increase substitution rates compared with the rates under site-independent evolution. We then assessed the epistatic sensitivity of individual site and discovered a counterintuitive effect: Highly connected sites were less influenced by epistasis relative to exposed sites. Lastly, we show that, despite the unrealistic assumptions, traditional models perform comparably well in the presence and absence of epistasis and provide reasonable summaries of average selection intensities. We conclude that epistatic models are critical to understanding protein evolutionary dynamics, but epistasis might not be required for reasonable inference of selection pressure when averaging over time and sites.
Collapse
Affiliation(s)
- Noor Youssef
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Edward Susko
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Joseph P Bielawski
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
18
|
Ren L, Zhang X, Li Y, Shang Y, Chen S, Wang S, Qu Y, Cai J, Guo Y. Comparative analysis of mitochondrial genomes among the subfamily Sarcophaginae (Diptera: Sarcophagidae) and phylogenetic implications. Int J Biol Macromol 2020; 161:214-222. [PMID: 32526299 DOI: 10.1016/j.ijbiomac.2020.06.043] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 05/25/2020] [Accepted: 06/05/2020] [Indexed: 12/16/2022]
Abstract
The subfamily Sarcophaginae is extremely diverse in morphology, habit and geographical distribution, and usually considered to be of significant ecological, medical, and forensic significance. In the present study, 18 mitochondrial genomes (mitogenomes) of sarcophagid flies were first obtained. The rearrangement and orientation of genes were identical with that of ancestral insects. The degrees of compositional heterogeneity in the datasets were extremely low. Furthermore, 13 protein-coding genes were evolving under purifying selection. The phylogenic relationship of the genus-group taxa Boettcheria + (Sarcophaga + (Peckia + (Ravinia + Oxysarcodexia))) was strongly supported. Four subgenera were recovered as monophyletic (Liopygia, Liosarcophaga, Pierretia, Heteronychia) in addition to Parasarcophaga as polyphyletic. The sister-relationships between S. dux and S. aegyptiaca, S. pingi and S. kawayuensis were recovered, respectively. Moreover, the molecular phylogenetic relationships among the subgenera Helicophagella, Kozlovea, Kramerea, Pandelleisca, Phallocheira, Pseudothyrsocnema, Sinonipponia and Seniorwhitea were rarely put forward prior to this study. This study provides insight into the population genetics, molecular biology, and phylogeny for the subfamily Sarcophaginae, especially for the subgeneric classification of Sarcophaga. However, compared with the enormous species diversity of flesh flies, the available mitogenomes are still limited for recovering the phylogeny of Sarcophaginae.
Collapse
Affiliation(s)
- Lipin Ren
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha, Hunan, China
| | - Xiangyan Zhang
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha, Hunan, China
| | - Yi Li
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha, Hunan, China
| | - Yanjie Shang
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha, Hunan, China
| | - Shan Chen
- School of Ecological and Environmental Sciences, East China Normal University, Shanghai, China
| | - Shiwen Wang
- Department of Forensic Science, School of Basic Medical Sciences, Xinjiang Medical University, Ürümqi, Xinjiang, China
| | - Yihong Qu
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha, Hunan, China
| | - Jifeng Cai
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha, Hunan, China
| | - Yadong Guo
- Department of Forensic Science, School of Basic Medical Sciences, Central South University, Changsha, Hunan, China.
| |
Collapse
|
19
|
Abstract
Snails, earthworms and flatworms are remarkably different animals, but they all exhibit a very similar mode of early embryogenesis: spiral cleavage. This is one of the most widespread developmental programs in animals, probably ancestral to almost half of the animal phyla, and therefore its study is essential for understanding animal development and evolution. However, our knowledge of spiral cleavage is still in its infancy. Recent technical and conceptual advances, such as the establishment of genome editing and improved phylogenetic resolution, are paving the way for a fresher and deeper look into this fascinating early cleavage mode.
Collapse
Affiliation(s)
- José M Martín-Durán
- Queen Mary, University of London, School of Biological and Chemical Sciences, Mile End Road, E1 4NS London, UK
| | - Ferdinand Marlétaz
- Molecular Genetics Unit, Okinawa Institute of Science & Technology, 1919-1, Tancha, Onna 904-0495, Japan
| |
Collapse
|
20
|
Siu-Ting K, Torres-Sánchez M, San Mauro D, Wilcockson D, Wilkinson M, Pisani D, O'Connell MJ, Creevey CJ. Inadvertent Paralog Inclusion Drives Artifactual Topologies and Timetree Estimates in Phylogenomics. Mol Biol Evol 2019; 36:1344-1356. [PMID: 30903171 PMCID: PMC6526904 DOI: 10.1093/molbev/msz067] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Increasingly, large phylogenomic data sets include transcriptomic data from nonmodel organisms. This not only has allowed controversial and unexplored evolutionary relationships in the tree of life to be addressed but also increases the risk of inadvertent inclusion of paralogs in the analysis. Although this may be expected to result in decreased phylogenetic support, it is not clear if it could also drive highly supported artifactual relationships. Many groups, including the hyperdiverse Lissamphibia, are especially susceptible to these issues due to ancient gene duplication events and small numbers of sequenced genomes and because transcriptomes are increasingly applied to resolve historically conflicting taxonomic hypotheses. We tested the potential impact of paralog inclusion on the topologies and timetree estimates of the Lissamphibia using published and de novo sequencing data including 18 amphibian species, from which 2,656 single-copy gene families were identified. A novel paralog filtering approach resulted in four differently curated data sets, which were used for phylogenetic reconstructions using Bayesian inference, maximum likelihood, and quartet-based supertrees. We found that paralogs drive strongly supported conflicting hypotheses within the Lissamphibia (Batrachia and Procera) and older divergence time estimates even within groups where no variation in topology was observed. All investigated methods, except Bayesian inference with the CAT-GTR model, were found to be sensitive to paralogs, but with filtering convergence to the same answer (Batrachia) was observed. This is the first large-scale study to address the impact of orthology selection using transcriptomic data and emphasizes the importance of quality over quantity particularly for understanding relationships of poorly sampled taxa.
Collapse
Affiliation(s)
- Karen Siu-Ting
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast, United Kingdom.,School of Biotechnology, Dublin City University, Glasnevin, Dublin, Ireland.,Dpto. de Herpetología, Museo de Historia Natural, Universidad Nacional Mayor de San Marcos, Lima, Perú
| | - María Torres-Sánchez
- Department of Biodiversity, Ecology, and Evolution, Complutense University of Madrid, Madrid, Spain.,Department of Neuroscience, Spinal Cord and Brain Injury Research Center and Ambystoma Genetic Stock Center, University of Kentucky, Lexington, KY
| | - Diego San Mauro
- Department of Biodiversity, Ecology, and Evolution, Complutense University of Madrid, Madrid, Spain
| | - David Wilcockson
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom
| | - Mark Wilkinson
- Department of Life Sciences, Natural History Museum, London, United Kingdom
| | - Davide Pisani
- Life Sciences Building, University of Bristol, Bristol, United Kingdom
| | - Mary J O'Connell
- School of Biology, Faculty of Biological Sciences, University of Leeds, Leeds, United Kingdom.,School of Life Sciences, University of Nottingham, University Park, United Kingdom
| | - Christopher J Creevey
- Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, Belfast, United Kingdom
| |
Collapse
|
21
|
Laurin-Lemay S, Rodrigue N, Lartillot N, Philippe H. Conditional Approximate Bayesian Computation: A New Approach for Across-Site Dependency in High-Dimensional Mutation-Selection Models. Mol Biol Evol 2019; 35:2819-2834. [PMID: 30203003 DOI: 10.1093/molbev/msy173] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
A key question in molecular evolutionary biology concerns the relative roles of mutation and selection in shaping genomic data. Moreover, features of mutation and selection are heterogeneous along the genome and over time. Mechanistic codon substitution models based on the mutation-selection framework are promising approaches to separating these effects. In practice, however, several complications arise, since accounting for such heterogeneities often implies handling models of high dimensionality (e.g., amino acid preferences), or leads to across-site dependence (e.g., CpG hypermutability), making the likelihood function intractable. Approximate Bayesian Computation (ABC) could address this latter issue. Here, we propose a new approach, named Conditional ABC (CABC), which combines the sampling efficiency of MCMC and the flexibility of ABC. To illustrate the potential of the CABC approach, we apply it to the study of mammalian CpG hypermutability based on a new mutation-level parameter implying dependence across adjacent sites, combined with site-specific purifying selection on amino-acids captured by a Dirichlet process. Our proof-of-concept of the CABC methodology opens new modeling perspectives. Our application of the method reveals a high level of heterogeneity of CpG hypermutability across loci and mild heterogeneity across taxonomic groups; and finally, we show that CpG hypermutability is an important evolutionary factor in rendering relative synonymous codon usage. All source code is available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git).
Collapse
Affiliation(s)
- Simon Laurin-Lemay
- Robert-Cedergren Center for Bioinformatics and Genomics, Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, Ottawa, ON, Canada
| | - Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Évolutive, UMR CNRS 5558, Université Lyon 1, Lyon, France
| | - Hervé Philippe
- Robert-Cedergren Center for Bioinformatics and Genomics, Department of Biochemistry and Molecular Medicine, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada.,Centre de Théorisation et de Modélisation de la Biodiversité, Station d'Écologie Théorique et Expérimentale, UMR CNRS 5321, Moulis, France
| |
Collapse
|
22
|
Hubert J, Nesvorna M, Kopecky J, Erban T, Klimov P. Population and Culture Age Influence the Microbiome Profiles of House Dust Mites. MICROBIAL ECOLOGY 2019; 77:1048-1066. [PMID: 30465068 DOI: 10.1007/s00248-018-1294-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Accepted: 11/13/2018] [Indexed: 05/09/2023]
Abstract
Interactions with microorganisms might enable house dust mites (HDMs) to derive nutrients from difficult-to-digest structural proteins and to flourish in human houses. We tested this hypothesis by investigating the effects of changes in the mite culture growth and population of two HDM species on HDM microbiome composition and fitness. Growing cultures of laboratory and industrial allergen-producing populations of Dermatophagoides farinae (DFL and DFT, respectively) and Dermatophagoides pteronyssinus (DPL and DPT, respectively) were sampled at four time points. The symbiotic microorganisms of the mites were characterized by DNA barcode sequencing and quantified by qPCR using universal/specific primers. The population growth of mites and nutrient contents of mite bodies were measured and correlated with the changes in bacteria in the HDM microbiome. The results showed that both the population and culture age significantly influenced the microbiome profiles. Cardinium formed 93% and 32% of the total sequences of the DFL and DFT bacterial microbiomes, respectively, but this bacterial species was less abundant in the DPL and DPT microbiomes. Staphylococcus abundance was positively correlated with increased glycogen contents in the bodies of mites, and increased abundances of Aspergillus, Candida, and Kocuria were correlated with increased lipid contents in the bodies of mites. The xerophilic fungus Wallemia accounted for 39% of the fungal sequences in the DPL microbiome, but its abundance was low in the DPT, DFL, and DFT microbiomes. With respect to the mite culture age, we made three important observations: the mite population growth from young cultures was 5-8-fold higher than that from old cultures; specimens from old cultures had greater abundances of fungi and bacteria in their bodies; and yeasts predominated in the gut contents of specimens from young cultures, whereas filamentous mycelium prevailed in specimens from old cultures. Our results are consistent with the hypothesis that mites derive nutrients through associations with microorganisms.
Collapse
Affiliation(s)
- Jan Hubert
- Crop Research Institute, Drnovska 507/73, CZ-16106, Prague 6-Ruzyne, Czechia.
| | - Marta Nesvorna
- Crop Research Institute, Drnovska 507/73, CZ-16106, Prague 6-Ruzyne, Czechia
| | - Jan Kopecky
- Crop Research Institute, Drnovska 507/73, CZ-16106, Prague 6-Ruzyne, Czechia
| | - Tomas Erban
- Crop Research Institute, Drnovska 507/73, CZ-16106, Prague 6-Ruzyne, Czechia
| | - Pavel Klimov
- Department of Ecology and Evolutionary Biology, University of Michigan, 3600 Varsity Drive, Ann Arbor, MI, 48109-2228, USA
- Institute of Biology, University of Tyumen, Pirogova 3, Tyumen, Russia, 625043
| |
Collapse
|
23
|
Beaulieu JM, O’Meara BC, Zaretzki R, Landerer C, Chai J, Gilchrist MA. Population Genetics Based Phylogenetics Under Stabilizing Selection for an Optimal Amino Acid Sequence: A Nested Modeling Approach. Mol Biol Evol 2019; 36:834-851. [PMID: 30521036 PMCID: PMC6445302 DOI: 10.1093/molbev/msy222] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
We present a new phylogenetic approach, selection on amino acids and codons (SelAC), whose substitution rates are based on a nested model linking protein expression to population genetics. Unlike simpler codon models that assume a single substitution matrix for all sites, our model more realistically represents the evolution of protein-coding DNA under the assumption of consistent, stabilizing selection using a cost-benefit approach. This cost-benefit approach allows us to generate a set of 20 optimal amino acid-specific matrix families using just a handful of parameters and naturally links the strength of stabilizing selection to protein synthesis levels, which we can estimate. Using a yeast data set of 100 orthologs for 6 taxa, we find SelAC fits the data much better than popular models by 104-105 Akike information criterion units adjusted for small sample bias. Our results also indicated that nested, mechanistic models better predict observed data patterns highlighting the improvement in biological realism in amino acid sequence evolution that our model provides. Additional parameters estimated by SelAC indicate that a large amount of nonphylogenetic, but biologically meaningful, information can be inferred from existing data. For example, SelAC prediction of gene-specific protein synthesis rates correlates well with both empirical (r=0.33-0.48) and other theoretical predictions (r=0.45-0.64) for multiple yeast species. SelAC also provides estimates of the optimal amino acid at each site. Finally, because SelAC is a nested approach based on clearly stated biological assumptions, future modifications, such as including shifts in the optimal amino acid sequence within or across lineages, are possible.
Collapse
Affiliation(s)
- Jeremy M Beaulieu
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| | - Brian C O’Meara
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| | | | - Cedric Landerer
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| | - Juanjuan Chai
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
- Suite 1039, White Plains, NY
| | - Michael A Gilchrist
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| |
Collapse
|
24
|
Laurin-Lemay S, Philippe H, Rodrigue N. Multiple Factors Confounding Phylogenetic Detection of Selection on Codon Usage. Mol Biol Evol 2019; 35:1463-1472. [PMID: 29596640 DOI: 10.1093/molbev/msy047] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Detecting selection on codon usage (CU) is a difficult task, since CU can be shaped by both the mutational process and selective constraints operating at the DNA, RNA, and protein levels. Yang and Nielsen (2008) developed a test (which we call CUYN) for detecting selection on CU using two competing mutation-selection models of codon substitution. The null model assumes that CU is determined by the mutation bias alone, whereas the alternative model assumes that both mutation bias and/or selection act on CU. In applications on mammalian-scale alignments, the CUYN test detects selection on CU for numerous genes. This is surprising, given the small effective population size of mammals, and prompted us to use simulations to evaluate the robustness of the test to model violations. Simulations using a modest level of CpG hypermutability completely mislead the test, with 100% false positives. Surprisingly, a high level of false positives (56.1%) resulted simply from using the HKY mutation-level parameterization within the CUYN test on simulations conducted with a GTR mutation-level parameterization. Finally, by using a crude optimization procedure on a parameter controlling the CpG hypermutability rate, we find that this mutational property could explain a very large part of the observed mammalian CU. Altogether, our work emphasizes the need to evaluate the potential impact of model violations on statistical tests in the field of molecular phylogenetic analysis. The source code of the simulator and the mammalian genes used are available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git).
Collapse
Affiliation(s)
- Simon Laurin-Lemay
- Department of Biochemistry and Molecular Medicine, Robert-Cedergren Center for Bioinformatics and Genomics, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Hervé Philippe
- Department of Biochemistry and Molecular Medicine, Robert-Cedergren Center for Bioinformatics and Genomics, Faculty of Medicine, Université de Montréal, Montréal, QC, Canada.,Centre de Théorisation et de Modélisation de la Biodiversité, Station d'Écologie Théorique et Expérimentale, UMR CNRS 5321, Moulis, Ariège, France
| | - Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, Ottawa, ON, Canada
| |
Collapse
|
25
|
Jones CT, Youssef N, Susko E, Bielawski JP. Phenomenological Load on Model Parameters Can Lead to False Biological Conclusions. Mol Biol Evol 2019; 35:1473-1488. [PMID: 29596684 DOI: 10.1093/molbev/msy049] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
When a substitution model is fitted to an alignment using maximum likelihood, its parameters are adjusted to account for as much site-pattern variation as possible. A parameter might therefore absorb a substantial quantity of the total variance in an alignment (or more formally, bring about a substantial reduction in the deviance of the fitted model) even if the process it represents played no role in the generation of the data. When this occurs, we say that the parameter estimate carries phenomenological load (PL). Large PL in a parameter estimate is a concern because it not only invalidates its mechanistic interpretation (if it has one) but also increases the likelihood that it will be found to be statistically significant. The problem of PL was not identified in the past because most off-the-shelf substitution models make simplifying assumptions that preclude the generation of realistic levels of variation. In this study, we use the more realistic mutation-selection framework as the basis of a generating model formulated to produce data that mimic an alignment of mammalian mitochondrial DNA. We show that a parameter estimate can carry PL when 1) the substitution model is underspecified and 2) the parameter represents a process that is confounded with other processes represented in the data-generating model. We then provide a method that can be used to identify signal for the process that a given parameter represents despite the existence of PL.
Collapse
Affiliation(s)
- Christopher T Jones
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - Noor Youssef
- Department of Biology, Dalhousie University, Halifax, NS, Canada
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | | |
Collapse
|
26
|
Kazmi SO, Rodrigue N. Detecting amino acid preference shifts with codon-level mutation-selection mixture models. BMC Evol Biol 2019; 19:62. [PMID: 30808289 PMCID: PMC6390532 DOI: 10.1186/s12862-019-1358-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 01/11/2019] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND In recent years, increasing attention has been placed on the development of phylogeny-based statistical methodologies for uncovering site-specific changes in amino acid fitness profiles over time. The few available random-effects approaches, modelling across-site variation in amino acid profiles as random variables drawn from a statistical law, either lack a mechanistic codon-level formulation, or pose significant computational challenges. RESULTS Here, we bring together a few existing ideas to explore a simple and fast method based on a predefined finite mixture of amino acid profiles within a codon-level substitution model following the mutation-selection formulation. Our study is focused on the detection of site-specific shifts in amino acid profiles over a known sub-clade of a tree, using simulations with and without shifts over the sub-clade to study the properties of the method. Through modifications of the values of the amino acid profiles, our simulations show different levels of reliability under different forms of finite mixture models. Sites identified by our method in a real data set show obvious overlap with those identified using previous methods, with some notable differences. CONCLUSION Overall, our results show that when a site-specific shift in amino acid profile is strongly pronounced, involving two clearly different sets of profiles, the method performs very well; but shifts between profiles that share many features are difficult to correctly identify, highlighting the challenging nature of the problem.
Collapse
Affiliation(s)
- S Omar Kazmi
- Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada
| | - Nicolas Rodrigue
- Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada. .,Institute of Biochemistry and School of Mathematics and Statistics, Carleton University, 1125 Colonel By Drive, Ottawa, K1S 5B6, Canada.
| |
Collapse
|
27
|
A New Spiralian Phylogeny Places the Enigmatic Arrow Worms among Gnathiferans. Curr Biol 2019; 29:312-318.e3. [PMID: 30639106 DOI: 10.1016/j.cub.2018.11.042] [Citation(s) in RCA: 134] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 09/12/2018] [Accepted: 11/14/2018] [Indexed: 11/23/2022]
Abstract
Chaetognaths (arrow worms) are an enigmatic group of marine animals whose phylogenetic position remains elusive, in part because they display a mix of developmental and morphological characters associated with other groups [1, 2]. In particular, it remains unclear whether they are a sister group to protostomes [1, 2], one of the principal animal superclades, or whether they bear a closer relationship with some spiralian phyla [3, 4]. Addressing the phylogenetic position of chaetognaths and refining our understanding of relationships among spiralians are essential to fully comprehend character changes during bilaterian evolution [5]. To tackle these questions, we generated new transcriptomes for ten chaetognath species, compiling an extensive phylogenomic dataset that maximizes data occupancy and taxonomic representation. We employed inference methods that consider rate and compositional heterogeneity across taxa to avoid limitations of earlier analyses [6]. In this way, we greatly improved the resolution of the protostome tree of life. We find that chaetognaths cluster together with rotifers, gnathostomulids, and micrognathozoans within an expanded Gnathifera clade and that this clade is the sister group to other spiralians [7, 8]. Our analysis shows that several previously proposed groupings are likely due to systematic error, and we propose a revised organization of Lophotrochozoa with three main clades: Tetraneuralia (mollusks and entoprocts), Lophophorata (brachiopods, phoronids, and ectoprocts), and a third unnamed clade gathering annelids, nemerteans, and platyhelminthes. Consideration of classical morphological, developmental, and genomic characters in light of this topology indicates secondary loss as a fundamental trend in spiralian evolution.
Collapse
|
28
|
Looking for Darwin in Genomic Sequences: Validity and Success Depends on the Relationship Between Model and Data. Methods Mol Biol 2019; 1910:399-426. [PMID: 31278672 DOI: 10.1007/978-1-4939-9074-0_13] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Codon substitution models (CSMs) are commonly used to infer the history of natural section for a set of protein-coding sequences, often with the explicit goal of detecting the signature of positive Darwinian selection. However, the validity and success of CSMs used in conjunction with the maximum likelihood (ML) framework is sometimes challenged with claims that the approach might too often support false conclusions. In this chapter, we use a case study approach to identify four legitimate statistical difficulties associated with inference of evolutionary events using CSMs. These include: (1) model misspecification, (2) low information content, (3) the confounding of processes, and (4) phenomenological load, or PL. While past criticisms of CSMs can be connected to these issues, the historical critiques were often misdirected, or overstated, because they failed to recognize that the success of any model-based approach depends on the relationship between model and data. Here, we explore this relationship and provide a candid assessment of the limitations of CSMs to extract historical information from extant sequences. To aid in this assessment, we provide a brief overview of: (1) a more realistic way of thinking about the process of codon evolution framed in terms of population genetic parameters, and (2) a novel presentation of the ML statistical framework. We then divide the development of CSMs into two broad phases of scientific activity and show that the latter phase is characterized by increases in model complexity that can sometimes negatively impact inference of evolutionary mechanisms. Such problems are not yet widely appreciated by the users of CSMs. These problems can be avoided by using a model that is appropriate for the data; but, understanding the relationship between the data and a fitted model is a difficult task. We argue that the only way to properly understand that relationship is to perform in silico experiments using a generating process that can mimic the data as closely as possible. The mutation-selection modeling framework (MutSel) is presented as the basis of such a generating process. We contend that if complex CSMs continue to be developed for testing explicit mechanistic hypotheses, then additional analyses such as those described in here (e.g., penalized LRTs and estimation of PL) will need to be applied alongside the more traditional inferential methods.
Collapse
|
29
|
Figueroa-Martinez F, Jackson C, Reyes-Prieto A. Plastid Genomes from Diverse Glaucophyte Genera Reveal a Largely Conserved Gene Content and Limited Architectural Diversity. Genome Biol Evol 2019; 11:174-188. [PMID: 30534986 PMCID: PMC6330054 DOI: 10.1093/gbe/evy268] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/10/2018] [Indexed: 12/30/2022] Open
Abstract
Plastid genome (ptDNA) data of Glaucophyta have been limited for many years to the genus Cyanophora. Here, we sequenced the ptDNAs of Gloeochaete wittrockiana, Cyanoptyche gloeocystis, Glaucocystis incrassata, and Glaucocystis sp. BBH. The reported sequences are the first genome-scale plastid data available for these three poorly studied glaucophyte genera. Although the Glaucophyta plastids appear morphologically “ancestral,” they actually bear derived genomes not radically different from those of red algae or viridiplants. The glaucophyte plastid coding capacity is highly conserved (112 genes shared) and the architecture of the plastid chromosomes is relatively simple. Phylogenomic analyses recovered Glaucophyta as the earliest diverging Archaeplastida lineage, but the position of viridiplants as the first branching group was not rejected by the approximately unbiased test. Pairwise distances estimated from 19 different plastid genes revealed that the highest sequence divergence between glaucophyte genera is frequently higher than distances between species of different classes within red algae or viridiplants. Gene synteny and sequence similarity in the ptDNAs of the two Glaucocystis species analyzed is conserved. However, the ptDNA of Gla. incrassata contains a 7.9-kb insertion not detected in Glaucocystis sp. BBH. The insertion contains ten open reading frames that include four coding regions similar to bacterial serine recombinases (two open reading frames), DNA primases, and peptidoglycan aminohydrolases. These three enzymes, often encoded in bacterial plasmids and bacteriophage genomes, are known to participate in the mobilization and replication of DNA mobile elements. It is therefore plausible that the insertion in Gla. incrassata ptDNA is derived from a DNA mobile element.
Collapse
Affiliation(s)
- Francisco Figueroa-Martinez
- Department of Biology, University of New Brunswick, Fredericton, New Brunswick, Canada.,CONACyT-Universidad Autónoma Metropolitana Iztapalapa, Biotechnology Department, Mexico City, Mexico
| | - Christopher Jackson
- Department of Biology, University of New Brunswick, Fredericton, New Brunswick, Canada.,School of Biosciences, University of Melbourne, Melbourne, Australia
| | - Adrian Reyes-Prieto
- Department of Biology, University of New Brunswick, Fredericton, New Brunswick, Canada
| |
Collapse
|
30
|
Abstract
In this chapter, we give a not-so-long and self-contained introduction to computational molecular evolution. In particular, we present the emergence of the use of likelihood-based methods, review the standard DNA substitution models, and introduce how model choice operates. We also present recent developments in inferring absolute divergence times and rates on a phylogeny, before showing how state-of-the-art models take inspiration from diffusion theory to link population genetics, which traditionally focuses at a taxonomic level below that of the species, and molecular evolution. Although this is not a cookbook chapter, we try and point to popular programs and implementations along the way.
Collapse
|
31
|
Hilton SK, Bloom JD. Modeling site-specific amino-acid preferences deepens phylogenetic estimates of viral sequence divergence. Virus Evol 2018; 4:vey033. [PMID: 30425841 PMCID: PMC6220371 DOI: 10.1093/ve/vey033] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Molecular phylogenetics is often used to estimate the time since the divergence of modern gene sequences. For highly diverged sequences, such phylogenetic techniques sometimes estimate surprisingly recent divergence times. In the case of viruses, independent evidence indicates that the estimates of deep divergence times from molecular phylogenetics are sometimes too recent. This discrepancy is caused in part by inadequate models of purifying selection leading to branch-length underestimation. Here we examine the effect on branch-length estimation of using models that incorporate experimental measurements of purifying selection. We find that models informed by experimentally measured site-specific amino-acid preferences estimate longer deep branches on phylogenies of influenza virus hemagglutinin. This lengthening of branches is due to more realistic stationary states of the models, and is mostly independent of the branch-length extension from modeling site-to-site variation in amino-acid substitution rate. The branch-length extension from experimentally informed site-specific models is similar to that achieved by other approaches that allow the stationary state to vary across sites. However, the improvements from all of these site-specific but time homogeneous and site independent models are limited by the fact that a protein’s amino-acid preferences gradually shift as it evolves. Overall, our work underscores the importance of modeling site-specific amino-acid preferences when estimating deep divergence times—but also shows the inherent limitations of approaches that fail to account for how these preferences shift over time.
Collapse
Affiliation(s)
- Sarah K Hilton
- Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.,Department of Genome Sciences, University of Washington, USA
| | - Jesse D Bloom
- Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center.,Department of Genome Sciences, University of Washington, USA.,Howard Hughes Medical Institute, Seattle, WA, USA
| |
Collapse
|
32
|
Brown MW, Heiss AA, Kamikawa R, Inagaki Y, Yabuki A, Tice AK, Shiratori T, Ishida KI, Hashimoto T, Simpson AGB, Roger AJ. Phylogenomics Places Orphan Protistan Lineages in a Novel Eukaryotic Super-Group. Genome Biol Evol 2018; 10:427-433. [PMID: 29360967 PMCID: PMC5793813 DOI: 10.1093/gbe/evy014] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/18/2018] [Indexed: 01/13/2023] Open
Abstract
Recent phylogenetic analyses position certain “orphan” protist lineages deep in the tree of eukaryotic life, but their exact placements are poorly resolved. We conducted phylogenomic analyses that incorporate deeply sequenced transcriptomes from representatives of collodictyonids (diphylleids), rigifilids, Mantamonas, and ancyromonads (planomonads). Analyses of 351 genes, using site-heterogeneous mixture models, strongly support a novel super-group-level clade that includes collodictyonids, rigifilids, and Mantamonas, which we name “CRuMs”. Further, they robustly place CRuMs as the closest branch to Amorphea (including animals and fungi). Ancyromonads are strongly inferred to be more distantly related to Amorphea than are CRuMs. They emerge either as sister to malawimonads, or as a separate deeper branch. CRuMs and ancyromonads represent two distinct major groups that branch deeply on the lineage that includes animals, near the most commonly inferred root of the eukaryote tree. This makes both groups crucial in examinations of the deepest-level history of extant eukaryotes.
Collapse
Affiliation(s)
- Matthew W Brown
- Department of Biological Sciences, Mississippi State University, USA.,Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, USA
| | - Aaron A Heiss
- Department of Biology, and Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada.,Department of Invertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, USA
| | - Ryoma Kamikawa
- Graduate School of Human and Environmental Studies, Graduate School of Global Environmental Studies, Kyoto University, Japan
| | - Yuji Inagaki
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan.,Center for Computational Sciences, University of Tsukuba, Ibaraki, Japan
| | - Akinori Yabuki
- Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa, Japan
| | - Alexander K Tice
- Department of Biological Sciences, Mississippi State University, USA.,Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, USA
| | - Takashi Shiratori
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan
| | - Ken-Ichiro Ishida
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan
| | - Tetsuo Hashimoto
- Graduate School of Life and Environmental Sciences, University of Tsukuba, Ibaraki, Japan.,Center for Computational Sciences, University of Tsukuba, Ibaraki, Japan
| | - Alastair G B Simpson
- Department of Biology, and Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Andrew J Roger
- Department of Biochemistry and Molecular Biology, and Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
33
|
Mahato S, Nie J, Plachetzki DC, Zelhof AC. A mosaic of independent innovations involving eyes shut are critical for the evolutionary transition from fused to open rhabdoms. Dev Biol 2018; 443:188-202. [PMID: 30243673 DOI: 10.1016/j.ydbio.2018.09.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 09/18/2018] [Accepted: 09/18/2018] [Indexed: 12/15/2022]
Abstract
A fundamental question in evolutionary biology is how developmental processes are modified to produce morphological innovations while abiding by functional constraints. Here we address this question by investigating the cellular mechanism responsible for the transition between fused and open rhabdoms in ommatidia of apposition compound eyes; a critical step required for the development of visual systems based on neural superposition. Utilizing Drosophila and Tribolium as representatives of fused and open rhabdom morphology in holometabolous insects respectively, we identified three changes required for this innovation to occur. First, the expression pattern of the extracellular matrix protein Eyes Shut (EYS) was co-opted and expanded from mechanosensory neurons to photoreceptor cells in taxa with open rhabdoms. Second, EYS homologs obtained a novel extension of the amino terminus leading to the internalization of a cleaved signal sequence. This amino terminus extension does not interfere with cleavage or function in mechanosensory neurons, but it does permit specific targeting of the EYS protein to the apical photoreceptor membrane. Finally, a specific interaction evolved between EYS and a subset of Prominin homologs that is required for the development of open, but not fused, rhabdoms. Together, our findings portray a case study wherein the evolution of a set of molecular novelties has precipitated the origin of an adaptive photoreceptor cell arrangement.
Collapse
Affiliation(s)
- Simpla Mahato
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Jing Nie
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - David C Plachetzki
- Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH 03824, USA.
| | - Andrew C Zelhof
- Department of Biology, Indiana University, Bloomington, IN 47405, USA.
| |
Collapse
|
34
|
Using the Mutation-Selection Framework to Characterize Selection on Protein Sequences. Genes (Basel) 2018; 9:genes9080409. [PMID: 30104502 PMCID: PMC6115872 DOI: 10.3390/genes9080409] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 08/02/2018] [Accepted: 08/09/2018] [Indexed: 12/13/2022] Open
Abstract
When mutational pressure is weak, the generative process of protein evolution involves explicit probabilities of mutations of different types coupled to their conditional probabilities of fixation dependent on selection. Establishing this mechanistic modeling framework for the detection of selection has been a goal in the field of molecular evolution. Building on a mathematical framework proposed more than a decade ago, numerous methods have been introduced in an attempt to detect and measure selection on protein sequences. In this review, we discuss the structure of the original model, subsequent advances, and the series of assumptions that these models operate under.
Collapse
|
35
|
Hubert J, Erban T, Kopecky J, Sopko B, Nesvorna M, Lichovnikova M, Schicht S, Strube C, Sparagano O. Comparison of Microbiomes between Red Poultry Mite Populations (Dermanyssus gallinae): Predominance of Bartonella-like Bacteria. MICROBIAL ECOLOGY 2017; 74:947-960. [PMID: 28534089 DOI: 10.1007/s00248-017-0993-z] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 05/01/2017] [Indexed: 05/09/2023]
Abstract
Blood feeding red poultry mites (RPM) serve as vectors of pathogenic bacteria and viruses among vertebrate hosts including wild birds, poultry hens, mammals, and humans. The microbiome of RPM has not yet been studied by high-throughput sequencing. RPM eggs, larvae, and engorged adult/nymph samples obtained in four poultry houses in Czechia were used for microbiome analyses by Illumina amplicon sequencing of the 16S ribosomal RNA (rRNA) gene V4 region. A laboratory RPM population was used as positive control for transcriptome analysis by pyrosequencing with identification of sequences originating from bacteria. The samples of engorged adult/nymph stages had 100-fold more copies of 16S rRNA gene copies than the samples of eggs and larvae. The microbiome composition showed differences among the four poultry houses and among observed developmental stadia. In the adults' microbiome 10 OTUs comprised 90 to 99% of all sequences. Bartonella-like bacteria covered between 30 and 70% of sequences in RPM microbiome and 25% bacterial sequences in transcriptome. The phylogenetic analyses of 16S rRNA gene sequences revealed two distinct groups of Bartonella-like bacteria forming sister groups: (i) symbionts of ants; (ii) Bartonella genus. Cardinium, Wolbachia, and Rickettsiella sp. were found in the microbiomes of all tested stadia, while Spiroplasma eriocheiris and Wolbachia were identified in the laboratory RPM transcriptome. The microbiomes from eggs, larvae, and engorged adults/nymphs differed. Bartonella-like symbionts were found in all stadia and sampling sites. Bartonella-like bacteria was the most diversified group within the RPM microbiome. The presence of identified putative pathogenic bacteria is relevant with respect to human and animal health issues while the identification of symbiontic bacteria can lead to new control methods targeting them to destabilize the arthropod host.
Collapse
Affiliation(s)
- Jan Hubert
- Crop Research Institute, Drnovska 507/73, Prague 6-Ruzyne, 161 06, Czechia.
| | - Tomas Erban
- Crop Research Institute, Drnovska 507/73, Prague 6-Ruzyne, 161 06, Czechia
| | - Jan Kopecky
- Crop Research Institute, Drnovska 507/73, Prague 6-Ruzyne, 161 06, Czechia
| | - Bruno Sopko
- Crop Research Institute, Drnovska 507/73, Prague 6-Ruzyne, 161 06, Czechia
- Department of Medical Chemistry and Clinical Biochemistry, 2nd Faculty of Medicine, Charles University and Motol University Hospital, V Uvalu 84/1, Prague, 5150 06, Czechia
| | - Marta Nesvorna
- Crop Research Institute, Drnovska 507/73, Prague 6-Ruzyne, 161 06, Czechia
| | - Martina Lichovnikova
- Faculty of AgriSciences, Mendel University in Brno, Zemedelska 1665/1, Brno, 61 300, Czechia
| | - Sabine Schicht
- Institute for Parasitology, Centre for Infection Medicine, University of Veterinary Medicine Hannover, Buenteweg 17, 30559, Hannover, Germany
| | - Christina Strube
- Institute for Parasitology, Centre for Infection Medicine, University of Veterinary Medicine Hannover, Buenteweg 17, 30559, Hannover, Germany
| | - Olivier Sparagano
- Vice-Chancellor Office, Centre for Applied Biological and Exercise Sciences, Coventry University, Priory Street, Coventry, CV1 5FB, UK
| |
Collapse
|
36
|
Sydykova DK, Jack BR, Spielman SJ, Wilke CO. Measuring evolutionary rates of proteins in a structural context. F1000Res 2017; 6:1845. [PMID: 29167739 DOI: 10.12688/f1000research.12874.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/18/2017] [Indexed: 11/20/2022] Open
Abstract
We describe how to measure site-specific rates of evolution in protein-coding genes and how to correlate these rates with structural features of the expressed protein, such as relative solvent accessibility, secondary structure, or weighted contact number. We present two alternative approaches to rate calculations: One based on relative amino-acid rates, and the other based on site-specific codon rates measured as dN/ dS. We additionally provide a code repository containing scripts to facilitate the specific analysis protocols we recommend.
Collapse
Affiliation(s)
- Dariya K Sydykova
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Benjamin R Jack
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Stephanie J Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
37
|
Sydykova DK, Jack BR, Spielman SJ, Wilke CO. Measuring evolutionary rates of proteins in a structural context. F1000Res 2017; 6:1845. [PMID: 29167739 PMCID: PMC5676193 DOI: 10.12688/f1000research.12874.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/31/2018] [Indexed: 12/14/2022] Open
Abstract
We describe how to measure site-specific rates of evolution in protein-coding genes and how to correlate these rates with structural features of the expressed protein, such as relative solvent accessibility, secondary structure, or weighted contact number. We present two alternative approaches to rate calculations: One based on relative amino-acid rates, and the other based on site-specific codon rates measured as
dN/
dS. We additionally provide a code repository containing scripts to facilitate the specific analysis protocols we recommend.
Collapse
Affiliation(s)
- Dariya K Sydykova
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Benjamin R Jack
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Stephanie J Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
38
|
Hilton SK, Doud MB, Bloom JD. phydms: software for phylogenetic analyses informed by deep mutational scanning. PeerJ 2017; 5:e3657. [PMID: 28785526 PMCID: PMC5541924 DOI: 10.7717/peerj.3657] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 07/15/2017] [Indexed: 11/30/2022] Open
Abstract
It has recently become possible to experimentally measure the effects of all amino-acid point mutations to proteins using deep mutational scanning. These experimental measurements can inform site-specific phylogenetic substitution models of gene evolution in nature. Here we describe software that efficiently performs analyses with such substitution models. This software, phydms, can be used to compare the results of deep mutational scanning experiments to the selection on genes in nature. Given a phylogenetic tree topology inferred with another program, phydms enables rigorous comparison of how well different experiments on the same gene capture actual natural selection. It also enables re-scaling of deep mutational scanning data to account for differences in the stringency of selection in the lab and nature. Finally, phydms can identify sites that are evolving differently in nature than expected from experiments in the lab. As data from deep mutational scanning experiments become increasingly widespread, phydms will facilitate quantitative comparison of the experimental results to the actual selection pressures shaping evolution in nature.
Collapse
Affiliation(s)
- Sarah K Hilton
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, United States of America
| | - Michael B Doud
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, United States of America.,Medical Scientist Training Program, University of Washington, Seattle, WA, United States of America
| | - Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Department of Genome Sciences, University of Washington, Seattle, WA, United States of America
| |
Collapse
|
39
|
Guy L. phyloSkeleton: taxon selection, data retrieval and marker identification for phylogenomics. Bioinformatics 2017; 33:1230-1232. [PMID: 28057682 PMCID: PMC5408842 DOI: 10.1093/bioinformatics/btw824] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 12/27/2016] [Indexed: 11/13/2022] Open
Abstract
Summary With the wealth of available genome sequences, a difficult and tedious part of inferring phylogenomic trees is now to select genomes with an appropriate taxon density in the different parts of the tree. The package described here offers tools to easily select the most representative organisms, following a set of simple rules based on taxonomy and assembly quality, to retrieve the genomes from public databases (NCBI, JGI), to annotate them if necessary, to identify given markers in these, and to prepare files for multiple sequence alignment. Availability and Implementation phyloSkeleton is a Perl module and is freely available under GPLv3 at https://bitbucket.org/lionelguy/phyloskeleton/ . Contact lionel.guy@imbim.uu.se.
Collapse
Affiliation(s)
- Lionel Guy
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- To whom correspondence should be addressed.
| |
Collapse
|
40
|
Bloom JD. Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models. Biol Direct 2017; 12:1. [PMID: 28095902 PMCID: PMC5240389 DOI: 10.1186/s13062-016-0172-z] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 12/14/2016] [Indexed: 12/23/2022] Open
Abstract
Background Sites of positive selection are identified by comparing observed evolutionary patterns to those expected under a null model for evolution in the absence of such selection. For protein-coding genes, the most common null model is that nonsynonymous and synonymous mutations fix at equal rates; this unrealistic model has limited power to detect many interesting forms of selection. Results I describe a new approach that uses a null model based on experimental measurements of a gene’s site-specific amino-acid preferences generated by deep mutational scanning in the lab. This null model makes it possible to identify both diversifying selection for repeated amino-acid change and differential selection for mutations to amino acids that are unexpected given the measurements made in the lab. I show that this approach identifies sites of adaptive substitutions in four genes (lactamase, Gal4, influenza nucleoprotein, and influenza hemagglutinin) far better than a comparable method that simply compares the rates of nonsynonymous and synonymous substitutions. Conclusions As rapid increases in biological data enable increasingly nuanced descriptions of the constraints on individual protein sites, approaches like the one here can improve our ability to identify many interesting forms of selection in natural sequences. Reviewers This article was reviewed by Sebastian Maurer-Stroh, Olivier Tenaillon, and Tal Pupko. All three reviewers are members of the Biology Direct editorial board. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0172-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jesse D Bloom
- Division of Basic Sciences and Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, 98109, WA, USA.
| |
Collapse
|
41
|
Expansion of the molecular and morphological diversity of Acanthamoebidae (Centramoebida, Amoebozoa) and identification of a novel life cycle type within the group. Biol Direct 2016; 11:69. [PMID: 28031045 PMCID: PMC5192571 DOI: 10.1186/s13062-016-0171-0] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 12/03/2016] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Acanthamoebidae is a "family" level amoebozoan group composed of the genera Acanthamoeba, Protacanthamoeba, and very recently Luapeleamoeba. This clade of amoebozoans has received considerable attention from the broader scientific community as Acanthamoeba spp. represent both model organisms and human pathogens. While the classical composition of the group (Acanthamoeba + Protacanthamoeba) has been well accepted due to the morphological and ultrastructural similarities of its members, the Acanthamoebidae has never been highly statistically supported in single gene phylogenetic reconstructions of Amoebozoa either by maximum likelihood (ML) or Bayesian analyses. RESULTS Here we show using a phylogenomic approach that the Acanthamoebidae is a fully supported monophyletic group within Amoebozoa with both ML and Bayesian analyses. We also expand the known range of morphological and life cycle diversity found in the Acanthamoebidae by demonstrating that the amoebozoans "Protostelium" arachisporum, Dracoamoeba jormungandri n. g. n. sp., and Vacuolamoeba acanthoformis n.g. n.sp., belong within the group. We also found that "Protostelium" pyriformis is clearly a species of Acanthamoeba making it the first reported sporocarpic member of the genus, that is, an amoeba that individually forms a walled, dormant propagule elevated by a non-cellular stalk. Our phylogenetic analyses recover a fully supported Acanthamoebidae composed of five genera. Two of these genera (Acanthamoeba and Luapeleameoba) have members that are sporocarpic. CONCLUSIONS Our results provide high statistical support for an Acanthamoebidae that is composed of five distinct genera. This study increases the known morphological diversity of this group and shows that species of Acanthamoeba can include spore-bearing stages. This further illustrates the widespread nature of spore-bearing stages across the tree of Amoebozoa. REVIEWERS This article was reviewed by Drs. Eugene Koonin, Purificacion Lopez-Garcia and Sandra Baldauf. Sandra Baldauf was nominated by Purificacion Lopez-Garcia, an Editorial Board member.
Collapse
|
42
|
Hubert J, Kopecky J, Nesvorna M, Alejandra Perotti M, Erban T. Detection and localization of Solitalea-like and Cardinium bacteria in three Acarus siro populations (Astigmata: Acaridae). EXPERIMENTAL & APPLIED ACAROLOGY 2016; 70:309-327. [PMID: 27502113 DOI: 10.1007/s10493-016-0080-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 07/26/2016] [Indexed: 05/09/2023]
Abstract
Bacteria associated with mites influence their fitness, nutrition and reproduction. Previously, we found Solitalea-like (Sphingobacteriales) and Candidatus Cardinium (Cytophagales) bacteria in the stored product mite Acarus siro L. by cloning and using pyrosequencing. In this study, taxon-specific primers targeting 16S rRNA gene were used to detect and quantify the bacteria in mites and eggs of three A. siro populations. The specific probes for fluorescent in situ hybridization (FISH) were used to localize Solitalea-like and Cardinium bacteria in mite bodies. The population growth as an indirect estimator of fitness was used to describe the mite-bacteria interactions on (1) control diet; (2) rifampicin supplemented diet; (3) tetracycline supplemented diet; (4) rifampicin pretreated mites; (5) tetracycline pretreated mites. Solitalea-like 16S rRNA gene sequences from A. siro formed a separate cluster together with sequences from Tyrophagus putrescentiae. qPCR analysis indicated that number of Solitalea-like bacteria 16S rRNA gene copies was ca. 100× higher than that of Cardinium and the numbers differed between populations. FISH analysis localized Solitalea-like bacteria in the parenchymal tissues, mesodeum and food bolus of larvae, nymphs and adults. Solitalea-like, but not Cardinium bacteria were detected by taxon-specific primers in mites and eggs of all three investigated populations. None of the antibiotic treatments eliminated Solitalea-like bacteria in the A. siro populations tested. Rifampicin pretreatment significantly decreased the population growth. The numbers of Solitalea-like bacteria did not correlate with the population growth as a fitness indicator. This study demonstrated that A. siro can host Solitalea-like bacteria either alone or together with Cardinium. We suggest that Solitalea-like bacteria are shared by vertical transfer in A. siro populations.
Collapse
Affiliation(s)
- Jan Hubert
- Crop Research Institute, Drnovska 507/73, Prague 6-Ruzyne, 16106, Czech Republic.
| | - Jan Kopecky
- Crop Research Institute, Drnovska 507/73, Prague 6-Ruzyne, 16106, Czech Republic
| | - Marta Nesvorna
- Crop Research Institute, Drnovska 507/73, Prague 6-Ruzyne, 16106, Czech Republic
| | - M Alejandra Perotti
- Ecology and Evolutionary Biology Section, School of Biological Sciences, University of Reading, Whiteknights, Reading, Berkshire, RG6 6AS, UK
| | - Tomas Erban
- Crop Research Institute, Drnovska 507/73, Prague 6-Ruzyne, 16106, Czech Republic
| |
Collapse
|
43
|
Rodrigue N, Lartillot N. Detecting Adaptation in Protein-Coding Genes Using a Bayesian Site-Heterogeneous Mutation-Selection Codon Substitution Model. Mol Biol Evol 2016; 34:204-214. [PMID: 27744408 PMCID: PMC5854120 DOI: 10.1093/molbev/msw220] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Codon substitution models have traditionally attempted to uncover signatures of adaptation within protein-coding genes by contrasting the rates of synonymous and non-synonymous substitutions. Another modeling approach, known as the mutation–selection framework, attempts to explicitly account for selective patterns at the amino acid level, with some approaches allowing for heterogeneity in these patterns across codon sites. Under such a model, substitutions at a given position occur at the neutral or nearly neutral rate when they are synonymous, or when they correspond to replacements between amino acids of similar fitness; substitutions from high to low (low to high) fitness amino acids have comparatively low (high) rates. Here, we study the use of such a mutation–selection framework as a null model for the detection of adaptation. Following previous works in this direction, we include a deviation parameter that has the effect of capturing the surplus, or deficit, in non-synonymous rates, relative to what would be expected under a mutation–selection modeling framework that includes a Dirichlet process approach to account for across-codon-site variation in amino acid fitness profiles. We use simulations, along with a few real data sets, to study the behavior of the approach, and find it to have good power with a low false-positive rate. Altogether, we emphasize the potential of recent mutation–selection models in the detection of adaptation, calling for further model refinements as well as large-scale applications.
Collapse
Affiliation(s)
- Nicolas Rodrigue
- Department of Biology, Institute of Biochemistry, and School of Mathematics and Statistics, Carleton University, Ottawa, Canada
| | - Nicolas Lartillot
- Université de Lyon, Laboratoire de Biométrie, Biologie Évolutive, Villeurbanne, France
| |
Collapse
|
44
|
Derelle R, López-García P, Timpano H, Moreira D. A Phylogenomic Framework to Study the Diversity and Evolution of Stramenopiles (=Heterokonts). Mol Biol Evol 2016; 33:2890-2898. [PMID: 27512113 DOI: 10.1093/molbev/msw168] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Stramenopiles or heterokonts constitute one of the most speciose and diverse clades of protists. It includes ecologically important algae (such as diatoms or large multicellular brown seaweeds), as well as heterotrophic (e.g., bicosoecids, MAST groups) and parasitic (e.g., Blastocystis, oomycetes) species. Despite their evolutionary and ecological relevance, deep phylogenetic relationships among stramenopile groups, inferred mostly from small-subunit rDNA phylogenies, remain unresolved, especially for the heterotrophic taxa. Taking advantage of recently released stramenopile transcriptome and genome sequences, as well as data from the genomic assembly of the MAST-3 species Incisomonas marina generated in our laboratory, we have carried out the first extensive phylogenomic analysis of stramenopiles, including representatives of most major lineages. Our analyses, based on a large data set of 339 widely distributed proteins, strongly support a root of stramenopiles lying between two clades, Bigyra and Gyrista (Pseudofungi plus Ochrophyta). Additionally, our analyses challenge the Phaeista-Khakista dichotomy of photosynthetic stramenopiles (ochrophytes) as two groups previously considered to be part of the Phaeista (Pelagophyceae and Dictyochophyceae), branch with strong support with the Khakista (Bolidophyceae and Diatomeae). We propose a new classification of ochrophytes within the two groups Chrysista and Diatomista to reflect the new phylogenomic results. Our stramenopile phylogeny provides a robust phylogenetic framework to investigate the evolution and diversification of this group of ecologically relevant protists.
Collapse
Affiliation(s)
- Romain Derelle
- Unité d'Ecologie, Systématique et Evolution, Centre National de la Recherche Scientifique (CNRS), Université Paris-Sud/Paris-Saclay, AgroParisTech, Orsay, France
| | - Purificación López-García
- Unité d'Ecologie, Systématique et Evolution, Centre National de la Recherche Scientifique (CNRS), Université Paris-Sud/Paris-Saclay, AgroParisTech, Orsay, France
| | - Hélène Timpano
- Unité d'Ecologie, Systématique et Evolution, Centre National de la Recherche Scientifique (CNRS), Université Paris-Sud/Paris-Saclay, AgroParisTech, Orsay, France
| | - David Moreira
- Unité d'Ecologie, Systématique et Evolution, Centre National de la Recherche Scientifique (CNRS), Université Paris-Sud/Paris-Saclay, AgroParisTech, Orsay, France
| |
Collapse
|
45
|
Spielman SJ, Wilke CO. Extensively Parameterized Mutation-Selection Models Reliably Capture Site-Specific Selective Constraint. Mol Biol Evol 2016; 33:2990-3002. [PMID: 27512115 DOI: 10.1093/molbev/msw171] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The mutation-selection model of coding sequence evolution has received renewed attention for its use in estimating site-specific amino acid propensities and selection coefficient distributions. Two computationally tractable mutation-selection inference frameworks have been introduced: One framework employs a fixed-effects, highly parameterized maximum likelihood approach, whereas the other employs a random-effects Bayesian Dirichlet Process approach. While both implementations follow the same model, they appear to make distinct predictions about the distribution of selection coefficients. The fixed-effects framework estimates a large proportion of highly deleterious substitutions, whereas the random-effects framework estimates that all substitutions are either nearly neutral or weakly deleterious. It remains unknown, however, how accurately each method infers evolutionary constraints at individual sites. Indeed, selection coefficient distributions pool all site-specific inferences, thereby obscuring a precise assessment of site-specific estimates. Therefore, in this study, we use a simulation-based strategy to determine how accurately each approach recapitulates the selective constraint at individual sites. We find that the fixed-effects approach, despite its extensive parameterization, consistently and accurately estimates site-specific evolutionary constraint. By contrast, the random-effects Bayesian approach systematically underestimates the strength of natural selection, particularly for slowly evolving sites. We also find that, despite the strong differences between their inferred selection coefficient distributions, the fixed- and random-effects approaches yield surprisingly similar inferences of site-specific selective constraint. We conclude that the fixed-effects mutation-selection framework provides the more reliable software platform for model application and future development.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX Present address: Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX
| |
Collapse
|
46
|
Hubert J, Stejskal V, Nesvorna M, Aulicky R, Kopecky J, Erban T. Differences in the Bacterial Community of Laboratory and Wild Populations of the Predatory Mite Cheyletus eruditus (Acarina: Cheyletidae) and Bacteria Transmission From Its Prey Acarus siro (Acari: Acaridae). JOURNAL OF ECONOMIC ENTOMOLOGY 2016; 109:1450-1457. [PMID: 27018441 DOI: 10.1093/jee/tow032] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Accepted: 02/03/2016] [Indexed: 06/05/2023]
Abstract
The parthenogenetic predatory mite Cheyletus eruditus (Schrank, 1781) is used for biological control against mite pests produced as CHEYLETIN. Although there is evidence that bacteria are mainly responsible for parthenogeny in several species of predatory mites, the description of association between C. eruditus the specific and parasitic or symbiotic bacteria is still missing. We analyzed the bacterial communities of the predator, C. eruditus , and its prey, Acarus siro L. The 16S rRNA gene was amplified, cloned, and sequenced. The selected bacterial taxa were confirmed by amplification of isolated DNA with taxon-specific primers. The 16S rRNA gene sequences from the predatory and prey mites formed a total of 20 different bacterial taxa. Of these taxa, the predator and prey shared four taxa, six taxa were specific for the predatory, and 10 taxa for the prey mites. Cardinium - and Bartonella -like bacteria were found in both mite species. The reproductive parasite Wolbachia was found only in the predatory mite, and A. siro hosted Solitalea -like (Sphingobacteriales) bacteria that were not detected in C. eruditus . We focused on Cardinium occurrence in the field samples of C. eruditus. Using Cardinium -specific primers, 128 clones were obtained. Cardinium was found in seven field samples of C. eruditus as well as in the laboratory population that was used to produce CHEYLETIN. Phylogenetic analysis of the Cardinium clones identified three separate clusters: two clusters showed high similarity to the Cardinium sequences from astigmatid mites, and one cluster contained only the clones from C. eruditus . Sequences of both Cardinium and Wolbachia were found in the both adults and eggs of C. eruditus , indicating maternal transfer of these endosymbiotic bacteria.
Collapse
|
47
|
Spielman SJ, Wilke CO. Pyvolve: A Flexible Python Module for Simulating Sequences along Phylogenies. PLoS One 2015; 10:e0139047. [PMID: 26397960 PMCID: PMC4580465 DOI: 10.1371/journal.pone.0139047] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 09/07/2015] [Indexed: 11/19/2022] Open
Abstract
We introduce Pyvolve, a flexible Python module for simulating genetic data along a phylogeny using continuous-time Markov models of sequence evolution. Easily incorporated into Python bioinformatics pipelines, Pyvolve can simulate sequences according to most standard models of nucleotide, amino-acid, and codon sequence evolution. All model parameters are fully customizable. Users can additionally specify custom evolutionary models, with custom rate matrices and/or states to evolve. This flexibility makes Pyvolve a convenient framework not only for simulating sequences under a wide variety of conditions, but also for developing and testing new evolutionary models. Pyvolve is an open-source project under a FreeBSD license, and it is available for download, along with a detailed user-manual and example scripts, from http://github.com/sjspielman/pyvolve.
Collapse
Affiliation(s)
- Stephanie J. Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, United States of America
| | - Claus O. Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, United States of America
| |
Collapse
|
48
|
Abstract
Numerous computational methods exist to assess the mode and strength of natural selection in protein-coding sequences, yet how distinct methods relate to one another remains largely unknown. Here, we elucidate the relationship between two widely used phylogenetic modeling frameworks: dN/dS models and mutation-selection (MutSel) models. We derive a mathematical relationship between dN/dS and scaled selection coefficients, the focal parameters of MutSel models, and use this relationship to gain deeper insight into the behaviors, limitations, and applicabilities of these two modeling frameworks. We prove that, if all synonymous changes are neutral, standard MutSel models correspond to dN/dS ≤ 1. However, if synonymous codons differ in fitness, dN/dS can take on arbitrarily high values even if all selection is purifying. Thus, the MutSel modeling framework cannot necessarily accommodate positive, diversifying selection, while dN/dS cannot distinguish between purifying selection on synonymous codons and positive selection on amino acids. We further propose a new benchmarking strategy of dN/dS inferences against MutSel simulations and demonstrate that the widely used Goldman-Yang-style dN/dS models yield substantially biased dN/dS estimates on realistic sequence data. In contrast, the less frequently used Muse-Gaut-style models display much less bias. Strikingly, the least-biased and most precise dN/dS estimates are never found in the models with the best fit to the data, measured through both AIC and BIC scores. Thus, selecting models based on goodness-of-fit criteria can yield poor parameter estimates if the models considered do not precisely correspond to the underlying mechanism that generated the data. In conclusion, establishing mathematical links among modeling frameworks represents a novel, powerful strategy to pinpoint previously unrecognized model limitations and strengths.
Collapse
Affiliation(s)
- Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cellular and Molecular Biology, The University of Texas at Austin
| |
Collapse
|
49
|
A penalized-likelihood method to estimate the distribution of selection coefficients from phylogenetic data. Genetics 2014; 197:257-71. [PMID: 24532780 DOI: 10.1534/genetics.114.162263] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
We develop a maximum penalized-likelihood (MPL) method to estimate the fitnesses of amino acids and the distribution of selection coefficients (S = 2Ns) in protein-coding genes from phylogenetic data. This improves on a previous maximum-likelihood method. Various penalty functions are used to penalize extreme estimates of the fitnesses, thus correcting overfitting by the previous method. Using a combination of computer simulation and real data analysis, we evaluate the effect of the various penalties on the estimation of the fitnesses and the distribution of S. We show the new method regularizes the estimates of the fitnesses for small, relatively uninformative data sets, but it can still recover the large proportion of deleterious mutations when present in simulated data. Computer simulations indicate that as the number of taxa in the phylogeny or the level of sequence divergence increases, the distribution of S can be more accurately estimated. Furthermore, the strength of the penalty can be varied to study how informative a particular data set is about the distribution of S. We analyze three protein-coding genes (the chloroplast rubisco protein, mammal mitochondrial proteins, and an influenza virus polymerase) and show the new method recovers a large proportion of deleterious mutations in these data, even under strong penalties, confirming the distribution of S is bimodal in these real data. We recommend the use of the new MPL approach for the estimation of the distribution of S in species phylogenies of protein-coding genes.
Collapse
|