1
|
Sharma S, Kumar S. Discovering Fragile Clades and Causal Sequences in Phylogenomics by Evolutionary Sparse Learning. Mol Biol Evol 2024; 41:msae131. [PMID: 38916040 PMCID: PMC11247346 DOI: 10.1093/molbev/msae131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 05/30/2024] [Accepted: 06/20/2024] [Indexed: 06/26/2024] Open
Abstract
Phylogenomic analyses of long sequences, consisting of many genes and genomic segments, reconstruct organismal relationships with high statistical confidence. But, inferred relationships can be sensitive to excluding just a few sequences. Currently, there is no direct way to identify fragile relationships and the associated individual gene sequences in species. Here, we introduce novel metrics for gene-species sequence concordance and clade probability derived from evolutionary sparse learning models. We validated these metrics using fungi, plant, and animal phylogenomic datasets, highlighting the ability of the new metrics to pinpoint fragile clades and the sequences responsible. The new approach does not necessitate the investigation of alternative phylogenetic hypotheses, substitution models, or repeated data subset analyses. Our methodology offers a streamlined approach to evaluating major inferred clades and identifying sequences that may distort reconstructed phylogenies using large datasets.
Collapse
Affiliation(s)
- Sudip Sharma
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| |
Collapse
|
2
|
Arasti S, Tabaghi P, Tabatabaee Y, Mirarab S. Branch Length Transforms using Optimal Tree Metric Matching. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.13.566962. [PMID: 38746464 PMCID: PMC11092445 DOI: 10.1101/2023.11.13.566962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
The abundant discordance between evolutionary relationships across the genome has rekindled interest in ways of comparing and averaging trees on a shared leaf set. However, most attempts at reconciling trees have focused on tree topology, producing metrics for comparing topologies and methods for computing median tree topologies. Using branch lengths, however, has been more elusive, due to several challenges. Species tree branch lengths can be measured in many units, often different from gene trees. Moreover, rates of evolution change across the genome, the species tree, and specific branches of gene trees. These factors compound the stochasticity of coalescence times. Thus, branch lengths are highly heterogeneous across both the genome and the tree. For many downstream applications in phylogenomic analyses, branch lengths are as important as the topology, and yet, existing tools to compare and combine weighted trees are limited. In this paper, we make progress on the question of mapping one tree to another, incorporating both topology and branch length. We define a series of computational problems to formalize finding the best transformation of one tree to another while maintaining its topology and other constraints. We show that all these problems can be solved in quadratic time and memory using a linear algebraic formulation coupled with dynamic programming preprocessing. Our formulations lead to convex optimization problems, with efficient and theoretically optimal solutions. While many applications can be imagined for this framework, we apply it to measure species tree branch lengths in the unit of the expected number of substitutions per site while allowing divergence from ultrametricity across the tree. In these applications, our method matches or surpasses other methods designed directly for solving those problems. Thus, our approach provides a versatile toolkit that finds applications in similar evolutionary questions. Code availability The software is available at https://github.com/shayesteh99/TCMM.git . Data availability Data are available on Github https://github.com/shayesteh99/TCMM-Data.git .
Collapse
|
3
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
4
|
Comte A, Tricou T, Tannier E, Joseph J, Siberchicot A, Penel S, Allio R, Delsuc F, Dray S, de Vienne DM. PhylteR: Efficient Identification of Outlier Sequences in Phylogenomic Datasets. Mol Biol Evol 2023; 40:msad234. [PMID: 37879113 PMCID: PMC10655845 DOI: 10.1093/molbev/msad234] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 09/29/2023] [Accepted: 10/18/2023] [Indexed: 10/27/2023] Open
Abstract
In phylogenomics, incongruences between gene trees, resulting from both artifactual and biological reasons, can decrease the signal-to-noise ratio and complicate species tree inference. The amount of data handled today in classical phylogenomic analyses precludes manual error detection and removal. However, a simple and efficient way to automate the identification of outliers from a collection of gene trees is still missing. Here, we present PhylteR, a method that allows rapid and accurate detection of outlier sequences in phylogenomic datasets, i.e. species from individual gene trees that do not follow the general trend. PhylteR relies on DISTATIS, an extension of multidimensional scaling to 3 dimensions to compare multiple distance matrices at once. In PhylteR, these distance matrices extracted from individual gene phylogenies represent evolutionary distances between species according to each gene. On simulated datasets, we show that PhylteR identifies outliers with more sensitivity and precision than a comparable existing method. We also show that PhylteR is not sensitive to ILS-induced incongruences, which is a desirable feature. On a biological dataset of 14,463 genes for 53 species previously assembled for Carnivora phylogenomics, we show (i) that PhylteR identifies as outliers sequences that can be considered as such by other means, and (ii) that the removal of these sequences improves the concordance between the gene trees and the species tree. Thanks to the generation of numerous graphical outputs, PhylteR also allows for the rapid and easy visual characterization of the dataset at hand, thus aiding in the precise identification of errors. PhylteR is distributed as an R package on CRAN and as containerized versions (docker and singularity).
Collapse
Affiliation(s)
- Aurore Comte
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
- IRD, CIRAD, INRAE, Institut Agro, PHIM Plant Health Institute, Montpellier University, Montpellier, France
| | - Théo Tricou
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Eric Tannier
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
- Centre de Recherches Inria de Lyon, Villeurbanne, France
| | - Julien Joseph
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Aurélie Siberchicot
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Simon Penel
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Rémi Allio
- CBGP, INRAE, CIRAD, IRD, Montpellier SupAgro, Univ. Montpellier, Montpellier, France
| | | | - Stéphane Dray
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| | - Damien M de Vienne
- Université de Lyon, Université Lyon 1, UMR CNRS 5558 Laboratoire de Biométrie et Biologie Évolutive, Villeurbanne, France
| |
Collapse
|
5
|
Jiang Y, Tabaghi P, Mirarab S. Learning Hyperbolic Embedding for Phylogenetic Tree Placement and Updates. BIOLOGY 2022; 11:biology11091256. [PMID: 36138735 PMCID: PMC9495508 DOI: 10.3390/biology11091256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/11/2022] [Accepted: 08/19/2022] [Indexed: 11/20/2022]
Abstract
Simple Summary We show how the conventional (Euclidean) deep learning methods developed for phylogenetics can benefit from using hyperbolic geometry. The results point to lowered distance distortion and better accuracy in updating trees but not necessarily for phylogenetic placement. Abstract Phylogenetic placement, used widely in ecological analyses, seeks to add a new species to an existing tree. A deep learning approach was previously proposed to estimate the distance between query and backbone species by building a map from gene sequences to a high-dimensional space that preserves species tree distances. They then use a distance-based placement method to place the queries on that species tree. In this paper, we examine the appropriate geometry for faithfully representing tree distances while embedding gene sequences. Theory predicts that hyperbolic spaces should provide a drastic reduction in distance distortion compared to the conventional Euclidean space. Nevertheless, hyperbolic embedding imposes its own unique challenges related to arithmetic operations, exponentially-growing functions, and limited bit precision, and we address these challenges. Our results confirm that hyperbolic embeddings have substantially lower distance errors than Euclidean space. However, these better-estimated distances do not always lead to better phylogenetic placement. We then show that the deep learning framework can be used not just to place on a backbone tree but to update it to obtain a fully resolved tree. With our hyperbolic embedding framework, species trees can be updated remarkably accurately with only a handful of genes.
Collapse
Affiliation(s)
- Yueyu Jiang
- Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Puoya Tabaghi
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA 92093, USA
| | - Siavash Mirarab
- Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093, USA
- Correspondence: ; Tel.: +1-858-822-6245
| |
Collapse
|
6
|
Nečas T, Kielgast J, Nagy ZT, Kusamba Chifundera Z, Gvoždík V. Systematic position of the Clicking Frog (Kassinula Laurent, 1940), the problem of chimeric sequences and the revised classification of the family Hyperoliidae. Mol Phylogenet Evol 2022; 174:107514. [PMID: 35589055 DOI: 10.1016/j.ympev.2022.107514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 04/26/2022] [Accepted: 05/07/2022] [Indexed: 11/18/2022]
Abstract
The systematics of the African frog family Hyperoliidae has undergone turbulent changes in last decades. Representatives of several genera have not been genetically investigated or with only limited data, and their phylogenetic positions are thus still not reliably known. This is the case of the De Witte's Clicking Frog (Kassinula wittei) which belongs to a monotypic genus. This miniature frog occurs in a poorly studied region, southeastern Democratic Republic of the Congo, northern Zambia, Angola. So far it is not settled whether this genus belongs to the subfamily Kassininae as a relative of the genus Kassina, or to the subfamily Hyperoliinae as a relative of the genus Afrixalus. Here we present for the first time a multilocus phylogenetic reconstruction (using five nuclear and one mitochondrial marker) of the family Hyperoliidae, including Kassinula. We demonstrate with high confidence that Kassinula is a member of Hyperoliinae belonging to a clade also containing Afrixalus (sub-Saharan Africa), Heterixalus (Madagascar) and Tachycnemis (Seychelles). We find that Kassinula represents a divergent lineage (17-25 Mya), which supports its separate genus-level status, but its exact systematic position remains uncertain. We propose to name the clade to which the above four genera belong as the tribe Tachycnemini Channing, 1989. A new taxonomy of the family Hyperoliidae was recently proposed by Dubois et al. (2021: Megataxa 5, 1-738). We demonstrate here that the new taxonomy was based on a partially erroneous phylogenetic reconstruction resulting from a supermatrix analysis of chimeric DNA sequences combining data from two families, Hyperoliidae and Arthroleptidae (the case of Cryptothylax). We therefore correct the erroneous part and propose a new, revised suprageneric taxonomy of the family Hyperoliidae. We also emphasize the importance of inspecting individual genetic markers before their concatenation or coalescent-based tree reconstructions to avoid analyses of chimeric DNA sequences producing incorrect phylogenetic reconstructions. Especially when phylogenetic reconstructions are used to propose taxonomies and systematic classifications.
Collapse
Affiliation(s)
- Tadeáš Nečas
- Institute of Vertebrate Biology of the Czech Academy of Sciences, Brno, Czech Republic; Department of Botany and Zoology, Faculty of Science, Masaryk University, Brno, Czech Republic.
| | - Jos Kielgast
- Section for Freshwater Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark; Section for Marine Living Resources, National Institute of Aquatic Resources, Technical University of Denmark, Vejlsøvej 39, 8600 Silkeborg, Denmark
| | | | - Zacharie Kusamba Chifundera
- Laboratory of Herpetology, Department of Biology, Natural Science Research Centre, Lwiro, Democratic Republic of the Congo; National Pedagogical University, Kinshasa, Democratic Republic of the Congo
| | - Václav Gvoždík
- Institute of Vertebrate Biology of the Czech Academy of Sciences, Brno, Czech Republic; National Museum, Department of Zoology, Prague, Czech Republic.
| |
Collapse
|
7
|
Jiang Y, Balaban M, Zhu Q, Mirarab S. DEPP: Deep Learning Enables Extending Species Trees using Single Genes. Syst Biol 2022; 72:17-34. [PMID: 35485976 DOI: 10.1093/sysbio/syac031] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 04/13/2022] [Accepted: 04/22/2022] [Indexed: 11/13/2022] Open
Abstract
Placing new sequences onto reference phylogenies is increasingly used for analyzing environmental samples, especially microbiomes. Existing placement methods assume that query sequences have evolved under specific models directly on the reference phylogeny. For example, they assume single-gene data (e.g., 16S rRNA amplicons) have evolved under the GTR model on a gene tree. Placement, however, often has a more ambitious goal: extending a (genome-wide) species tree given data from individual genes without knowing the evolutionary model. Addressing this challenging problem requires new directions. Here, we introduce Deep-learning Enabled Phylogenetic Placement (DEPP), an algorithm that learns to extend species trees using single genes without pre-specified models. In simulations and on real data, we show that DEPP can match the accuracy of model-based methods without any prior knowledge of the model. We also show that DEPP can update the multi-locus microbial tree-of-life with single genes with high accuracy. We further demonstrate that DEPP can combine 16S and metagenomic data onto a single tree, enabling community structure analyses that take advantage of both sources of data.
Collapse
Affiliation(s)
- Yueyu Jiang
- Department of Electrical and Computer Engineering, UC San Diego, CA 92093, USA
| | - Metin Balaban
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, CA 92093, USA
| | - Qiyun Zhu
- Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ 85281, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, CA 92093, USA
| |
Collapse
|
8
|
Al Jewari C, Baldauf SL. Conflict over the eukaryote root resides in strong outliers, mosaics and missing data sensitivity of site-specific (CAT) mixture models. Syst Biol 2022; 72:1-16. [PMID: 35412616 DOI: 10.1093/sysbio/syac029] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 04/07/2022] [Indexed: 11/14/2022] Open
Abstract
Phylogenetic reconstruction using concatenated loci ("phylogenomics" or "supermatrix phylogeny") is a powerful tool for solving evolutionary splits that are poorly resolved in single gene/protein trees (SGTs). However, recent phylogenomic attempts to resolve the eukaryote root have yielded conflicting results, along with claims of various artefacts hidden in the data. We have investigated these conflicts using two new methods for assessing phylogenetic conflict. ConJak uses whole marker (gene or protein) jackknifing to assess deviation from a central mean for each individual sequence, while ConWin uses a sliding window to screen for incongruent protein fragments (mosaics). Both methods allow selective masking of individual sequences or sequence fragments in order to minimize missing data, an important consideration for resolving deep splits with limited data. Analyses focused on a set of 76 eukaryotic proteins of bacterial-ancestry previously used in various combinations to assess the branching order among the three major divisions of eukaryotes: Amorphea (mainly animals, fungi and Amoebozoa), Diaphoretickes (most other well-known eukaryotes and nearly all algae) and Excavata, represented here by Discoba (Jakobida, Heterolobosea, and Euglenozoa). ConJak analyses found strong outliers to be concentrated in under-sampled lineages, while ConWin analyses of Discoba, the most under-sampled of the major lineages, detected potentially incongruent fragments scattered throughout. Phylogenetic analyses of the full data using an LG-gamma model support a Discoba sister scenario (neozoan-excavate root), which rises to 99-100% bootstrap support with data masked according to either protocol. However, analyses with two site-specific (CAT) mixture models yielded widely inconsistent results and a striking sensitivity to missing data. The neozoan-excavate root places Amorphea and Diaphoretickes as more closely related to each other than either is to Discoba, a fundamental relationship that should remain unaffected by additional taxa.
Collapse
Affiliation(s)
- Caesar Al Jewari
- Program in Systematic Biology, Department of Organismal Biology, Uppsala University, Uppsala, Sweden 75236
| | - Sandra L Baldauf
- Program in Systematic Biology, Department of Organismal Biology, Uppsala University, Uppsala, Sweden 75236
| |
Collapse
|
9
|
Was the Last Bacterial Common Ancestor a Monoderm after All? Genes (Basel) 2022; 13:genes13020376. [PMID: 35205421 PMCID: PMC8871954 DOI: 10.3390/genes13020376] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/09/2022] [Accepted: 02/15/2022] [Indexed: 12/20/2022] Open
Abstract
The very nature of the last bacterial common ancestor (LBCA), in particular the characteristics of its cell wall, is a critical issue to understand the evolution of life on earth. Although knowledge of the relationships between bacterial phyla has made progress with the advent of phylogenomics, many questions remain, including on the appearance or disappearance of the outer membrane of diderm bacteria (also called Gram-negative bacteria). The phylogenetic transition between monoderm (Gram-positive bacteria) and diderm bacteria, and the associated peptidoglycan expansion or reduction, requires clarification. Herein, using a phylogenomic tree of cultivated and characterized bacteria as an evolutionary framework and a literature review of their cell-wall characteristics, we used Bayesian ancestral state reconstruction to infer the cell-wall architecture of the LBCA. With the same phylogenomic tree, we further revisited the evolution of the division and cell-wall synthesis (dcw) gene cluster using homology- and model-based methods. Finally, extensive similarity searches were carried out to determine the phylogenetic distribution of the genes involved with the biosynthesis of the outer membrane in diderm bacteria. Quite unexpectedly, our analyses suggest that all cultivated and characterized bacteria might have evolved from a common ancestor with a monoderm cell-wall architecture. If true, this would indicate that the appearance of the outer membrane was not a unique event and that selective forces have led to the repeated adoption of such an architecture. Due to the lack of phenotypic information, our methodology cannot be applied to all extant bacteria. Consequently, our conclusion might change once enough information is made available to allow the use of an even more diverse organism selection.
Collapse
|
10
|
Muñoz-Gómez SA, Susko E, Williamson K, Eme L, Slamovits CH, Moreira D, López-García P, Roger AJ. Site-and-branch-heterogeneous analyses of an expanded dataset favour mitochondria as sister to known Alphaproteobacteria. Nat Ecol Evol 2022; 6:253-262. [PMID: 35027725 DOI: 10.1038/s41559-021-01638-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 11/29/2021] [Indexed: 01/01/2023]
Abstract
Determining the phylogenetic origin of mitochondria is key to understanding the ancestral mitochondrial symbiosis and its role in eukaryogenesis. However, the precise evolutionary relationship between mitochondria and their closest bacterial relatives remains hotly debated. The reasons include pervasive phylogenetic artefacts as well as limited protein and taxon sampling. Here we developed a new model of protein evolution that accommodates both across-site and across-branch compositional heterogeneity. We applied this site-and-branch-heterogeneous model (MAM60 + GFmix) to a considerably expanded dataset that comprises 108 mitochondrial proteins of alphaproteobacterial origin, and novel metagenome-assembled genomes from microbial mats, microbialites and sediments. The MAM60 + GFmix model fits the data much better and agrees with analyses of compositionally homogenized datasets with conventional site-heterogenous models. The consilience of evidence thus suggests that mitochondria are sister to the Alphaproteobacteria to the exclusion of MarineProteo1 and Magnetococcia. We also show that the ancestral presence of the crista-developing mitochondrial contact site and cristae organizing system (a mitofilin-domain-containing Mic60 protein) in mitochondria and the Alphaproteobacteria only supports their close relationship.
Collapse
Affiliation(s)
- Sergio A Muñoz-Gómez
- Ecologie Systématique Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Orsay, France.
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Kelsey Williamson
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Laura Eme
- Ecologie Systématique Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Orsay, France
| | - Claudio H Slamovits
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - David Moreira
- Ecologie Systématique Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Orsay, France
| | | | - Andrew J Roger
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada.
| |
Collapse
|
11
|
Liston A, Weitemier KA, Letelier L, Podani J, Zong Y, Liu L, Dickinson TA. Phylogeny of Crataegus (Rosaceae) based on 257 nuclear loci and chloroplast genomes: evaluating the impact of hybridization. PeerJ 2021; 9:e12418. [PMID: 34754629 PMCID: PMC8555502 DOI: 10.7717/peerj.12418] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 10/10/2021] [Indexed: 11/20/2022] Open
Abstract
Background Hawthorn species (Crataegus L.; Rosaceae tribe Maleae) form a well-defined clade comprising five subgeneric groups readily distinguished using either molecular or morphological data. While multiple subsidiary groups (taxonomic sections, series) are recognized within some subgenera, the number of and relationships among species in these groups are subject to disagreement. Gametophytic apomixis and polyploidy are prevalent in the genus, and disagreement concerns whether and how apomictic genotypes should be recognized taxonomically. Recent studies suggest that many polyploids arise from hybridization between members of different infrageneric groups. Methods We used target capture and high throughput sequencing to obtain nucleotide sequences for 257 nuclear loci and nearly complete chloroplast genomes from a sample of hawthorns representing all five currently recognized subgenera. Our sample is structured to include two examples of intersubgeneric hybrids and their putative diploid and tetraploid parents. We queried the alignment of nuclear loci directly for evidence of hybridization, and compared individual gene trees with each other, and with both the maximum likelihood plastome tree and the nuclear concatenated and multilocus coalescent-based trees. Tree comparisons provided a promising, if challenging (because of the number of comparisons involved) method for visualizing variation in tree topology. We found it useful to deploy comparisons based not only on tree-tree distances but also on a metric of tree-tree concordance that uses extrinsic information about the relatedness of the terminals in comparing tree topologies. Results We obtained well-supported phylogenies from plastome sequences and from a minimum of 244 low copy-number nuclear loci. These are consistent with a previous morphology-based subgeneric classification of the genus. Despite the high heterogeneity of individual gene trees, we corroborate earlier evidence for the importance of hybridization in the evolution of Crataegus. Hybridization between subgenus Americanae and subgenus Sanguineae was documented for the origin of Sanguineae tetraploids, but not for a tetraploid Americanae species. This is also the first application of target capture probes designed with apple genome sequence. We successfully assembled 95% of 257 loci in Crataegus, indicating their potential utility across the genera of the apple tribe.
Collapse
Affiliation(s)
- Aaron Liston
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States of America
| | - Kevin A Weitemier
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States of America.,Department of Fisheries and Wildlife, Oregon State University, Corvallis, OR, United States of America
| | - Lucas Letelier
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States of America
| | - János Podani
- Department of Plant Systematics, Ecology and Theoretical Biology, Eötvös Lorand University, Budapest, Hungary
| | - Yu Zong
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States of America.,College of Chemistry & Life Sciences, Zhejiang Normal University, Jinhua, Zhejiang, China
| | - Lang Liu
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
| | - Timothy A Dickinson
- Department of Natural History, Royal Ontario Museum, Toronto, Ontario, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
12
|
Mongiardino Koch N. Phylogenomic Subsampling and the Search for Phylogenetically Reliable Loci. Mol Biol Evol 2021; 38:4025-4038. [PMID: 33983409 DOI: 10.1101/2021.02.13.431075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023] Open
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Collapse
|
13
|
Abstract
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.
Collapse
|
14
|
Zhang C, Zhao Y, Braun EL, Mirarab S. TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13696] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology Program University of California San Diego CA USA
| | - Yiming Zhao
- Electrical and Computer Engineering Department University of California San Diego CA USA
| | - Edward L. Braun
- Department of Biology and Genetics Institute University of Florida Gainesville FL USA
| | - Siavash Mirarab
- Electrical and Computer Engineering Department University of California San Diego CA USA
| |
Collapse
|
15
|
|
16
|
Hait JM, Cao G, Kastanis G, Yin L, Pettengill JB, Tallent SM. Evaluation of Virulence Determinants Using Whole-Genome Sequencing and Phenotypic Biofilm Analysis of Outbreak-Linked Staphylococcus aureus Isolates. Front Microbiol 2021. [PMID: 34349741 DOI: 10.3389/fmicb2021687625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Biofilms are a frequent cause of food contamination of potentially pathogenic bacteria, such as Staphylococcus aureus. Given its vast role in human disease, the possible impact of biofilm-producing S. aureus isolates in a food processing environment is evident. Sixty-nine S. aureus isolates collected from one firm following multiple staphylococcal food poisoning outbreak investigations were utilized for this analysis. Strain evaluations were performed to establish virulence determinants and the evolutionary relationships using data generated by shotgun whole-genome sequencing (WGS), along with end point polymerase chain reaction (PCR) and in vitro phenotypic assessments. S. aureus isolates were grouped into six well-supported clades in the phylogenetic tree, with the relationships within the clades indicating a strong degree of clonal structure. Our analysis identified four major sequence types 47.8% ST1, 31.9% ST45, 7.2% ST5, and 7.2% ST30 and two major spa types 47.8% t127 and 29.0% t3783. Extrapolated staphylococcal enterotoxin (SE) analysis found that all isolates were positive for at least 1 of the 23 SEs and/or SE-like toxin genes. Enterotoxigenic assessments found that 93% of the isolates expressed a classical SE(A-E). SE gene concurrence was observed at 96.2%, based on PCR and WGS results. In total, 46 gene targets were distinguished. This included genes that encode for adhesion and biofilm synthesis such as clfA, clfB, bbp, ebpS, ica, bap and agr. Our evaluation found agr group III to be the most prevalent at 55%, followed by 35% for agr group I. All isolates harbored the complete intercellular adhesion operon that is recognized to contain genes responsible for the adhesion step of biofilm formation by encoding proteins involved in the syntheses of the biofilm matrix. Phenotypic characterization of biofilm formation was evaluated three times, with each test completed in triplicate and accomplished utilizing the microtiter plate method and Congo red agar (CRA). The microtiter plate results indicated moderate to high biofilm formation for 96% of the isolates, with 4% exhibiting weak to no biofilm development. CRA results yielded all positive to intermediate results. The potential to inadvertently transfer pathogenic bacteria from the environment into food products creates challenges to any firm and may result in adulterated food.
Collapse
Affiliation(s)
- Jennifer M Hait
- Division of Microbiology, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Regulatory Science, College Park, MD, United States
| | - Guojie Cao
- Division of Microbiology, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Regulatory Science, College Park, MD, United States
| | - George Kastanis
- Division of Microbiology, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Regulatory Science, College Park, MD, United States
| | - Lanlan Yin
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Analytics and Outreach, College Park, MD, United States
| | - James B Pettengill
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Analytics and Outreach, College Park, MD, United States
| | - Sandra M Tallent
- Division of Microbiology, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Regulatory Science, College Park, MD, United States
| |
Collapse
|
17
|
Hait JM, Cao G, Kastanis G, Yin L, Pettengill JB, Tallent SM. Evaluation of Virulence Determinants Using Whole-Genome Sequencing and Phenotypic Biofilm Analysis of Outbreak-Linked Staphylococcus aureus Isolates. Front Microbiol 2021; 12:687625. [PMID: 34349741 PMCID: PMC8328053 DOI: 10.3389/fmicb.2021.687625] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 05/24/2021] [Indexed: 01/22/2023] Open
Abstract
Biofilms are a frequent cause of food contamination of potentially pathogenic bacteria, such as Staphylococcus aureus. Given its vast role in human disease, the possible impact of biofilm-producing S. aureus isolates in a food processing environment is evident. Sixty-nine S. aureus isolates collected from one firm following multiple staphylococcal food poisoning outbreak investigations were utilized for this analysis. Strain evaluations were performed to establish virulence determinants and the evolutionary relationships using data generated by shotgun whole-genome sequencing (WGS), along with end point polymerase chain reaction (PCR) and in vitro phenotypic assessments. S. aureus isolates were grouped into six well-supported clades in the phylogenetic tree, with the relationships within the clades indicating a strong degree of clonal structure. Our analysis identified four major sequence types 47.8% ST1, 31.9% ST45, 7.2% ST5, and 7.2% ST30 and two major spa types 47.8% t127 and 29.0% t3783. Extrapolated staphylococcal enterotoxin (SE) analysis found that all isolates were positive for at least 1 of the 23 SEs and/or SE-like toxin genes. Enterotoxigenic assessments found that 93% of the isolates expressed a classical SE(A–E). SE gene concurrence was observed at 96.2%, based on PCR and WGS results. In total, 46 gene targets were distinguished. This included genes that encode for adhesion and biofilm synthesis such as clfA, clfB, bbp, ebpS, ica, bap and agr. Our evaluation found agr group III to be the most prevalent at 55%, followed by 35% for agr group I. All isolates harbored the complete intercellular adhesion operon that is recognized to contain genes responsible for the adhesion step of biofilm formation by encoding proteins involved in the syntheses of the biofilm matrix. Phenotypic characterization of biofilm formation was evaluated three times, with each test completed in triplicate and accomplished utilizing the microtiter plate method and Congo red agar (CRA). The microtiter plate results indicated moderate to high biofilm formation for 96% of the isolates, with 4% exhibiting weak to no biofilm development. CRA results yielded all positive to intermediate results. The potential to inadvertently transfer pathogenic bacteria from the environment into food products creates challenges to any firm and may result in adulterated food.
Collapse
Affiliation(s)
- Jennifer M Hait
- Division of Microbiology, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Regulatory Science, College Park, MD, United States
| | - Guojie Cao
- Division of Microbiology, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Regulatory Science, College Park, MD, United States
| | - George Kastanis
- Division of Microbiology, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Regulatory Science, College Park, MD, United States
| | - Lanlan Yin
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Analytics and Outreach, College Park, MD, United States
| | - James B Pettengill
- Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Analytics and Outreach, College Park, MD, United States
| | - Sandra M Tallent
- Division of Microbiology, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Office of Regulatory Science, College Park, MD, United States
| |
Collapse
|
18
|
Knyshov A, Gordon ERL, Weirauch C. New alignment-based sequence extraction software (ALiBaSeq) and its utility for deep level phylogenetics. PeerJ 2021; 9:e11019. [PMID: 33850647 PMCID: PMC8019319 DOI: 10.7717/peerj.11019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 02/06/2021] [Indexed: 01/03/2023] Open
Abstract
Despite many bioinformatic solutions for analyzing sequencing data, few options exist for targeted sequence retrieval from whole genomic sequencing (WGS) data with the ultimate goal of generating a phylogeny. Available tools especially struggle at deep phylogenetic levels and necessitate amino-acid space searches, which may increase rates of false positive results. Many tools are also difficult to install and may lack adequate user resources. Here, we describe a program that uses freely available similarity search tools to find homologs in assembled WGS data with unparalleled freedom to modify parameters. We evaluate its performance compared to other commonly used bioinformatics tools on two divergent insect species (>200 My) for which annotated genomes exist, and on one large set each of highly conserved and more variable loci. Our software is capable of retrieving orthologs from well-curated or unannotated, low or high depth shotgun, and target capture assemblies as well or better than other software as assessed by recovering the most genes with maximal coverage and with a low rate of false positives throughout all datasets. When assessing this combination of criteria, ALiBaSeq is frequently the best evaluated tool for gathering the most comprehensive and accurate phylogenetic alignments on all types of data tested. The software (implemented in Python), tutorials, and manual are freely available at https://github.com/AlexKnyshov/alibaseq.
Collapse
Affiliation(s)
- Alexander Knyshov
- Department of Entomology, University of California, Riverside, Riverside, CA, USA
| | - Eric R L Gordon
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Christiane Weirauch
- Department of Entomology, University of California, Riverside, Riverside, CA, USA
| |
Collapse
|
19
|
Scossa F, Fernie AR. Ancestral sequence reconstruction - An underused approach to understand the evolution of gene function in plants? Comput Struct Biotechnol J 2021; 19:1579-1594. [PMID: 33868595 PMCID: PMC8039532 DOI: 10.1016/j.csbj.2021.03.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 03/04/2021] [Accepted: 03/06/2021] [Indexed: 02/06/2023] Open
Abstract
Whilst substantial research effort has been placed on understanding the interactions of plant proteins with their molecular partners, relatively few studies in plants - by contrast to work in other organisms - address how these interactions evolve. It is thought that ancestral proteins were more promiscuous than modern proteins and that specificity often evolved following gene duplication and subsequent functional refining. However, ancestral protein resurrection studies have found that some modern proteins have evolved de novo from ancestors lacking those functions. Intriguingly, the new interactions evolved as a consequence of just a few mutations and, as such, acquisition of new functions appears to be neither difficult nor rare, however, only a few of them are incorporated into biological processes before they are lost to subsequent mutations. Here, we detail the approach of ancestral sequence reconstruction (ASR), providing a primer to reconstruct the sequence of an ancestral gene. We will present case studies from a range of different eukaryotes before discussing the few instances where ancestral reconstructions have been used in plants. As ASR is used to dig into the remote evolutionary past, we will also present some alternative genetic approaches to investigate molecular evolution on shorter timescales. We argue that the study of plant secondary metabolism is particularly well suited for ancestral reconstruction studies. Indeed, its ancient evolutionary roots and highly diverse landscape provide an ideal context in which to address the focal issue around the emergence of evolutionary novelties and how this affects the chemical diversification of plant metabolism.
Collapse
Key Words
- APR, ancestral protein resurrection
- ASR, ancestral sequence reconstruction
- Ancestral sequence reconstruction
- CDS, coding sequence
- Evolution
- GR, glucocorticoid receptor
- GWAS, genome wide association study
- Genomics
- InDel, insertion/deletion
- MCMC, Markov Chain Monte Carlo
- ML, maximum likelihood
- MP, maximum parsimony
- MR, mineralcorticoid receptor
- MSA, multiple sequence alignment
- Metabolism
- NJ, neighbor-joining
- Phylogenetics
- Plants
- SFS, site frequency spectrum
Collapse
Affiliation(s)
- Federico Scossa
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Council for Agricultural Research and Economics (CREA), Research Centre for Genomics and Bioinformatics (CREA-GB), Rome, Italy
| | - Alisdair R. Fernie
- Max-Planck-Institute of Molecular Plant Physiology (MPI-MP), 14476 Potsdam-Golm, Germany
- Center of Plant Systems Biology and Biotechnology (CPSBB), Plovdiv, Bulgaria
| |
Collapse
|
20
|
Ghannoum S, Leoncio Netto W, Fantini D, Ragan-Kelley B, Parizadeh A, Jonasson E, Ståhlberg A, Farhan H, Köhn-Luque A. DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics. Int J Mol Sci 2021; 22:ijms22031399. [PMID: 33573289 PMCID: PMC7866810 DOI: 10.3390/ijms22031399] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 01/08/2021] [Accepted: 01/28/2021] [Indexed: 02/08/2023] Open
Abstract
The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.
Collapse
Affiliation(s)
- Salim Ghannoum
- Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, Norway; (A.P.); (H.F.)
- Correspondence: (S.G.); (A.K.-L.); Tel.: +46-76-5770129 (S.G.)
| | - Waldir Leoncio Netto
- Oslo Centre for Biostatistics and Epidemiology, Faculty of Medicine, University of Oslo, 0372 Oslo, Norway;
| | - Damiano Fantini
- Department of Urology, Northwestern University, Chicago, IL 60611, USA;
| | | | - Amirabbas Parizadeh
- Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, Norway; (A.P.); (H.F.)
| | - Emma Jonasson
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, SE-41390 Gothenburg, Sweden; (E.J.); (A.S.)
| | - Anders Ståhlberg
- Sahlgrenska Center for Cancer Research, Department of Laboratory Medicine, Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, SE-41390 Gothenburg, Sweden; (E.J.); (A.S.)
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, SE-41390 Gothenburg, Sweden
- Department of Clinical Genetics and Genomics, Sahlgrenska University Hospital, SE-41390 Gothenburg, Sweden
| | - Hesso Farhan
- Department of Molecular Medicine, Institute of Basic Medical Sciences, University of Oslo, 0372 Oslo, Norway; (A.P.); (H.F.)
| | - Alvaro Köhn-Luque
- Oslo Centre for Biostatistics and Epidemiology, Faculty of Medicine, University of Oslo, 0372 Oslo, Norway;
- Correspondence: (S.G.); (A.K.-L.); Tel.: +46-76-5770129 (S.G.)
| |
Collapse
|
21
|
Monteil CL, Grouzdev DS, Perrière G, Alonso B, Rouy Z, Cruveiller S, Ginet N, Pignol D, Lefevre CT. Repeated horizontal gene transfers triggered parallel evolution of magnetotaxis in two evolutionary divergent lineages of magnetotactic bacteria. THE ISME JOURNAL 2020; 14:1783-1794. [PMID: 32296121 PMCID: PMC7305187 DOI: 10.1038/s41396-020-0647-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 03/21/2020] [Accepted: 03/24/2020] [Indexed: 12/27/2022]
Abstract
Under the same selection pressures, two genetically divergent populations may evolve in parallel toward the same adaptive solutions. Here, we hypothesized that magnetotaxis (i.e., magnetically guided chemotaxis) represents a key adaptation to micro-oxic habitats in aquatic sediments and that its parallel evolution homogenized the phenotypes of two evolutionary divergent clusters of freshwater spirilla. All magnetotactic bacteria affiliated to the Magnetospirillum genus (Alphaproteobacteria class) biomineralize the same magnetic particle chains and share highly similar physiological and ultrastructural features. We looked for the processes that could have contributed at shaping such an evolutionary pattern by reconciling species and gene trees using newly sequenced genomes of Magnetospirillum related bacteria. We showed that repeated horizontal gene transfers and homologous recombination of entire operons contributed to the parallel evolution of magnetotaxis. We propose that such processes could represent a more parsimonious and rapid solution for adaptation compared with independent and repeated de novo mutations, especially in the case of traits as complex as magnetotaxis involving tens of interacting proteins. Besides strengthening the idea about the importance of such a function in micro-oxic habitats, these results reinforce previous observations in experimental evolution suggesting that gene flow could alleviate clonal interference and speed up adaptation under some circumstances.
Collapse
Affiliation(s)
- Caroline L Monteil
- Aix-Marseille University, CEA, CNRS, Biosciences and Biotechnologies Institute of Aix-Marseille, Saint Paul lez Durance, France.
| | - Denis S Grouzdev
- Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
| | - Guy Perrière
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, UMR5558, Université Claude Bernard - Lyon 1, 69622, Villeurbanne, France
| | - Béatrice Alonso
- Aix-Marseille University, CEA, CNRS, Biosciences and Biotechnologies Institute of Aix-Marseille, Saint Paul lez Durance, France
| | - Zoé Rouy
- LABGeM, Genomique Metabolique, CEA, Genoscope, Institut Francois Jacob, CNRS, Universite d'Evry, Universite Paris-Saclay, Evry, France
| | - Stéphane Cruveiller
- LABGeM, Genomique Metabolique, CEA, Genoscope, Institut Francois Jacob, CNRS, Universite d'Evry, Universite Paris-Saclay, Evry, France
| | - Nicolas Ginet
- Aix Marseille University, CNRS, LCB, Marseille, France
| | - David Pignol
- Aix-Marseille University, CEA, CNRS, Biosciences and Biotechnologies Institute of Aix-Marseille, Saint Paul lez Durance, France
| | - Christopher T Lefevre
- Aix-Marseille University, CEA, CNRS, Biosciences and Biotechnologies Institute of Aix-Marseille, Saint Paul lez Durance, France.
| |
Collapse
|
22
|
Yasui N, Vogiatzis C, Yoshida R, Fukumizu K. imPhy: Imputing Phylogenetic Trees with Missing Information Using Mathematical Programming. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1222-1230. [PMID: 30507538 DOI: 10.1109/tcbb.2018.2884459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Advances in modern genomics have allowed researchers to apply phylogenetic analyses on a genome-wide scale. While large volumes of genomic data can be generated cheaply and quickly, data missingness is a non-trivial and somewhat expected problem. Since the available information is often incomplete for a given set of genetic loci and individual organisms, a large proportion of trees that depict the evolutionary history of a single genetic locus, called gene trees, fail to contain all individuals. Data incompleteness causes difficulties in data collection, information extraction, and gene tree inference. Furthermore, identifying outlying gene trees, which can represent horizontal gene transfers, gene duplications, or hybridizations, is difficult when data is missing from the gene trees. The typical approach is to remove all individuals with missing data from the gene trees, and focus the analysis on individuals whose information is fully available - a huge loss of information. In this work, we propose and design an optimization-based imputation approach to infer the missing distances between leaves in a set of gene trees via a mixed integer non-linear programming model. We also present a new research pipeline, imPhy, that can (i) simulate a set of gene trees with leaves randomly missing in each tree, (ii) impute the missing pairwise distances in each gene tree, (iii) reconstruct the gene trees using the Neighbor Joining (NJ) and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) methods, and (iv) analyze and report the efficiency of the reconstruction. To impute the missing leaves, we employ our newly proposed non-linear programming framework, and demonstrate its capability in reconstructing gene trees with incomplete information in both simulated and empirical datasets. In the empirical datasets apicomplexa and lungfish, our imputation has very small normalized mean square errors, even in the extreme case where 50 percent of the individuals in each gene tree are missing. Data, software, and user manuals can be found at https://github.com/yasuiniko/imPhy.
Collapse
|
23
|
Abstract
Knowing phylogenetic relationships among species is fundamental for many studies in biology. An accurate phylogenetic tree underpins our understanding of the major transitions in evolution, such as the emergence of new body plans or metabolism, and is key to inferring the origin of new genes, detecting molecular adaptation, understanding morphological character evolution and reconstructing demographic changes in recently diverged species. Although data are ever more plentiful and powerful analysis methods are available, there remain many challenges to reliable tree building. Here, we discuss the major steps of phylogenetic analysis, including identification of orthologous genes or proteins, multiple sequence alignment, and choice of substitution models and inference methodologies. Understanding the different sources of errors and the strategies to mitigate them is essential for assembling an accurate tree of life.
Collapse
|
24
|
Muñoz-Gómez SA, Durnin K, Eme L, Paight C, Lane CE, Saffo MB, Slamovits CH. Nephromyces Represents a Diverse and Novel Lineage of the Apicomplexa That Has Retained Apicoplasts. Genome Biol Evol 2019; 11:2727-2740. [PMID: 31328784 PMCID: PMC6777426 DOI: 10.1093/gbe/evz155] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/16/2019] [Indexed: 12/13/2022] Open
Abstract
A most interesting exception within the parasitic Apicomplexa is Nephromyces, an extracellular, probably mutualistic, endosymbiont found living inside molgulid ascidian tunicates (i.e., sea squirts). Even though Nephromyces is now known to be an apicomplexan, many other questions about its nature remain unanswered. To gain further insights into the biology and evolutionary history of this unusual apicomplexan, we aimed to 1) find the precise phylogenetic position of Nephromyces within the Apicomplexa, 2) search for the apicoplast genome of Nephromyces, and 3) infer the major metabolic pathways in the apicoplast of Nephromyces. To do this, we sequenced a metagenome and a metatranscriptome from the molgulid renal sac, the specialized habitat where Nephromyces thrives. Our phylogenetic analyses of conserved nucleus-encoded genes robustly suggest that Nephromyces is a novel lineage sister to the Hematozoa, which comprises both the Haemosporidia (e.g., Plasmodium) and the Piroplasmida (e.g., Babesia and Theileria). Furthermore, a survey of the renal sac metagenome revealed 13 small contigs that closely resemble the genomes of the nonphotosynthetic reduced plastids, or apicoplasts, of other apicomplexans. We show that these apicoplast genomes correspond to a diverse set of most closely related but genetically divergent Nephromyces lineages that co-inhabit a single tunicate host. In addition, the apicoplast of Nephromyces appears to have retained all biosynthetic pathways inferred to have been ancestral to parasitic apicomplexans. Our results shed light on the evolutionary history of the only probably mutualistic apicomplexan known, Nephromyces, and provide context for a better understanding of its life style and intricate symbiosis.
Collapse
Affiliation(s)
- Sergio A Muñoz-Gómez
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Keira Durnin
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Laura Eme
- Unité d'Ecologie, Sistématique et Evolution, CNRS, Université Paris-Sud, France
| | | | | | - Mary B Saffo
- Smithsonian National Museum of Natural History, Washington, District of Columbia
| | - Claudio H Slamovits
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
25
|
Hamilton CA, St Laurent RA, Dexter K, Kitching IJ, Breinholt JW, Zwick A, Timmermans MJTN, Barber JR, Kawahara AY. Phylogenomics resolves major relationships and reveals significant diversification rate shifts in the evolution of silk moths and relatives. BMC Evol Biol 2019; 19:182. [PMID: 31533606 PMCID: PMC6751749 DOI: 10.1186/s12862-019-1505-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 08/29/2019] [Indexed: 03/13/2023] Open
Abstract
BACKGROUND Silkmoths and their relatives constitute the ecologically and taxonomically diverse superfamily Bombycoidea, which includes some of the most charismatic species of Lepidoptera. Despite displaying spectacular forms and diverse ecological traits, relatively little attention has been given to understanding their evolution and drivers of their diversity. To begin to address this problem, we created a new Bombycoidea-specific Anchored Hybrid Enrichment (AHE) probe set and sampled up to 571 loci for 117 taxa across all major lineages of the Bombycoidea, with a newly developed DNA extraction protocol that allows Lepidoptera specimens to be readily sequenced from pinned natural history collections. RESULTS The well-supported tree was overall consistent with prior morphological and molecular studies, although some taxa were misplaced. The bombycid Arotros Schaus was formally transferred to Apatelodidae. We identified important evolutionary patterns (e.g., morphology, biogeography, and differences in speciation and extinction), and our analysis of diversification rates highlights the stark increases that exist within the Sphingidae (hawkmoths) and Saturniidae (wild silkmoths). CONCLUSIONS Our study establishes a backbone for future evolutionary, comparative, and taxonomic studies of Bombycoidea. We postulate that the rate shifts identified are due to the well-documented bat-moth "arms race". Our research highlights the flexibility of AHE to generate genomic data from a wide range of museum specimens, both age and preservation method, and will allow researchers to tap into the wealth of biological data residing in natural history collections around the globe.
Collapse
Affiliation(s)
- C A Hamilton
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA.
- Department of Entomology, Plant Pathology & Nematology, University of Idaho, Moscow, ID, 83844, USA.
| | - R A St Laurent
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
| | - K Dexter
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
| | - I J Kitching
- Department of Life Sciences, Natural History Museum, Cromwell Road, London, SW7 5BD, UK
| | - J W Breinholt
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA
- RAPiD Genomics, 747 SW 2nd Avenue #314, Gainesville, FL, 32601, USA
| | - A Zwick
- Australian National Insect Collection, CSIRO, Clunies Ross St, Acton, ACT, Canberra, 2601, Australia
| | - M J T N Timmermans
- Department of Natural Sciences, Middlesex University, The Burroughs, London, NW4 4BT, UK
| | - J R Barber
- Department of Biological Sciences, Boise State University, Boise, ID, 83725, USA
| | - A Y Kawahara
- Florida Museum of Natural History, University of Florida, Gainesville, FL, 32611, USA.
| |
Collapse
|
26
|
Flandrois JP, Brochier-Armanet C, Briolay J, Abrouk D, Schwob G, Normand P, Fernandez MP. Taxonomic assignment of uncultured prokaryotes with long range PCR targeting the spectinomycin operon. Res Microbiol 2019; 170:280-287. [PMID: 31279085 DOI: 10.1016/j.resmic.2019.06.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 05/02/2019] [Accepted: 06/25/2019] [Indexed: 11/28/2022]
Abstract
The taxonomic assignment of uncultured prokaryotes to known taxa is a major challenge in microbial systematics. This relies usually on the phylogenetic analysis of the ribosomal small subunit RNA or a few housekeeping genes. Recent works have disclosed ribosomal proteins as valuable markers for systematics and, due to the boom in complete genome sequencing, their use has become widespread. Yet, in the case of uncultured strains, for which complete genome sequences cannot be easily obtained, sequencing many markers is complicated and time consuming. Taking the advantage of the organization of ribosomal protein coding genes in large gene clusters, we amplified a 32 kb conserved region encompassing the spectinomycin (spc) operon using long range PCR from isolated and from uncultured nodular endophytic Frankia strains. The phylogenetic analysis of the 27 ribosomal protein genes contained in this region provided a robust phylogenetic tree consistent with phylogenies based on larger set of markers, indicating that this subset of ribosomal proteins contains enough phylogenetic signal to address systematic issues. This work shows that using long range PCR could break down the barrier preventing the use of ribosomal proteins as phylogenetic markers when complete genome sequences cannot be easily obtained.
Collapse
Affiliation(s)
- Jean-Pierre Flandrois
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622, Villeurbanne, France.
| | - Céline Brochier-Armanet
- Université de Lyon, Université Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, F-69622, Villeurbanne, France.
| | - Jérôme Briolay
- Université de Lyon, Université Lyon 1, DTAMB, Villeurbanne, France.
| | - Danis Abrouk
- Université de Lyon, Université Lyon 1, CNRS, UMR5557, INRA, UMR1418, Laboratoire d'Écologie Microbienne, Villeurbanne, France.
| | - Guillaume Schwob
- Université de Lyon, Université Lyon 1, CNRS, UMR5557, INRA, UMR1418, Laboratoire d'Écologie Microbienne, Villeurbanne, France.
| | - Philippe Normand
- Université de Lyon, Université Lyon 1, CNRS, UMR5557, INRA, UMR1418, Laboratoire d'Écologie Microbienne, Villeurbanne, France.
| | - Maria P Fernandez
- Université de Lyon, Université Lyon 1, CNRS, UMR5557, INRA, UMR1418, Laboratoire d'Écologie Microbienne, Villeurbanne, France.
| |
Collapse
|
27
|
Duprey A, Taib N, Leonard S, Garin T, Flandrois JP, Nasser W, Brochier-Armanet C, Reverchon S. The phytopathogenic nature of Dickeya aquatica 174/2 and the dynamic early evolution of Dickeya pathogenicity. Environ Microbiol 2019; 21:2809-2835. [PMID: 30969462 DOI: 10.1111/1462-2920.14627] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 04/04/2019] [Accepted: 04/08/2019] [Indexed: 12/13/2022]
Abstract
Dickeya is a genus of phytopathogenic enterobacterales causing soft rot in a variety of plants (e.g. potato, chicory, maize). Among the species affiliated to this genus, Dickeya aquatica, described in 2014, remained particularly mysterious because it had no known host. Furthermore, while D. aquatica was proposed to represent a deep-branching species among Dickeya genus, its precise phylogenetic position remained elusive. Here, we report the complete genome sequence of the D. aquatica type strain 174/2. We demonstrate the affinity of D. aquatica strain 174/2 for acidic fruits such as tomato and cucumber and show that exposure of this bacterium to acidic pH induces twitching motility. An in-depth phylogenomic analysis of all available Dickeya proteomes pinpoints D. aquatica as the second deepest branching lineage within this genus and reclassifies two lineages that likely correspond to new genomospecies (gs.): Dickeya gs. poaceaephila (Dickeya sp NCPPB 569) and Dickeya gs. undicola (Dickeya sp 2B12), together with a new putative genus, tentatively named Prodigiosinella. Finally, from comparative analyses of Dickeya proteomes, we infer the complex evolutionary history of this genus, paving the way to study the adaptive patterns and processes of Dickeya to different environmental niches and hosts. In particular, we hypothesize that the lack of xylanases and xylose degradation pathways in D. aquatica could reflect adaptation to aquatic charophyte hosts which, in contrast to land plants, do not contain xyloglucans.
Collapse
Affiliation(s)
- Alexandre Duprey
- Univ Lyon, Université Claude Bernard Lyon 1, INSA-Lyon, CNRS, UMR5240, Microbiologie, Adaptation et Pathogénie, 10 Rue Raphaël Dubois, 69622, Villeurbanne, France
| | - Najwa Taib
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, 69622, Villeurbanne, France
| | - Simon Leonard
- Univ Lyon, Université Claude Bernard Lyon 1, INSA-Lyon, CNRS, UMR5240, Microbiologie, Adaptation et Pathogénie, 10 Rue Raphaël Dubois, 69622, Villeurbanne, France
| | - Tiffany Garin
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, 69622, Villeurbanne, France
| | - Jean-Pierre Flandrois
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, 69622, Villeurbanne, France
| | - William Nasser
- Univ Lyon, Université Claude Bernard Lyon 1, INSA-Lyon, CNRS, UMR5240, Microbiologie, Adaptation et Pathogénie, 10 Rue Raphaël Dubois, 69622, Villeurbanne, France
| | - Céline Brochier-Armanet
- Univ Lyon, Université Claude Bernard Lyon 1, CNRS, UMR5558, Laboratoire de Biométrie et Biologie Évolutive, 43 bd du 11 novembre 1918, 69622, Villeurbanne, France
| | - Sylvie Reverchon
- Univ Lyon, Université Claude Bernard Lyon 1, INSA-Lyon, CNRS, UMR5240, Microbiologie, Adaptation et Pathogénie, 10 Rue Raphaël Dubois, 69622, Villeurbanne, France
| |
Collapse
|
28
|
Straub K, Linde M, Kropp C, Blanquart S, Babinger P, Merkl R. Sequence selection by FitSS4ASR alleviates ancestral sequence reconstruction as exemplified for geranylgeranylglyceryl phosphate synthase. Biol Chem 2019; 400:367-381. [PMID: 30763032 DOI: 10.1515/hsz-2018-0344] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Accepted: 12/07/2018] [Indexed: 11/15/2022]
Abstract
For evolutionary studies, but also for protein engineering, ancestral sequence reconstruction (ASR) has become an indispensable tool. The first step of every ASR protocol is the preparation of a representative sequence set containing at most a few hundred recent homologs whose composition determines decisively the outcome of a reconstruction. A common approach for sequence selection consists of several rounds of manual recompilation that is driven by embedded phylogenetic analyses of the varied sequence sets. For ASR of a geranylgeranylglyceryl phosphate synthase, we additionally utilized FitSS4ASR, which replaces this time-consuming protocol with an efficient and more rational approach. FitSS4ASR applies orthogonal filters to a set of homologs to eliminate outlier sequences and those bearing only a weak phylogenetic signal. To demonstrate the usefulness of FitSS4ASR, we determined experimentally the oligomerization state of eight predecessors, which is a delicate and taxon-specific property. Corresponding ancestors deduced in a manual approach and by means of FitSS4ASR had the same dimeric or hexameric conformation; this concordance testifies to the efficiency of FitSS4ASR for sequence selection. FitSS4ASR-based results of two other ASR experiments were added to the Supporting Information. Program and documentation are available at https://gitlab.bioinf.ur.de/hek61586/FitSS4ASR.
Collapse
Affiliation(s)
- Kristina Straub
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Universitätsstraße 31, D-93040 Regensburg, Germany
| | - Mona Linde
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Universitätsstraße 31, D-93040 Regensburg, Germany
| | - Cosimo Kropp
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Universitätsstraße 31, D-93040 Regensburg, Germany
| | - Samuel Blanquart
- University of Rennes, Inria, CNRS, IRISA, F-35000 Rennes, France
| | - Patrick Babinger
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Universitätsstraße 31, D-93040 Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Universitätsstraße 31, D-93040 Regensburg, Germany
| |
Collapse
|
29
|
Bravo GA, Antonelli A, Bacon CD, Bartoszek K, Blom MPK, Huynh S, Jones G, Knowles LL, Lamichhaney S, Marcussen T, Morlon H, Nakhleh LK, Oxelman B, Pfeil B, Schliep A, Wahlberg N, Werneck FP, Wiedenhoeft J, Willows-Munro S, Edwards SV. Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics. PeerJ 2019; 7:e6399. [PMID: 30783571 PMCID: PMC6378093 DOI: 10.7717/peerj.6399] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 01/07/2019] [Indexed: 12/23/2022] Open
Abstract
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Alexandre Antonelli
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
- Gothenburg Botanical Garden, Göteborg, Sweden
| | - Christine D. Bacon
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Krzysztof Bartoszek
- Department of Computer and Information Science, Linköping University, Linköping, Sweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Stella Huynh
- Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
| | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Sangeet Lamichhaney
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Thomas Marcussen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| | - Hélène Morlon
- Institut de Biologie, Ecole Normale Supérieure de Paris, Paris, France
| | - Luay K. Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bengt Oxelman
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Bernard Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Alexander Schliep
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| | | | - Fernanda P. Werneck
- Coordenação de Biodiversidade, Programa de Coleções Científicas Biológicas, Instituto Nacional de Pesquisa da Amazônia, Manaus, AM, Brazil
| | - John Wiedenhoeft
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Sandi Willows-Munro
- School of Life Sciences, University of Kwazulu-Natal, Pietermaritzburg, South Africa
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Centre for Advanced Studies in Science and Technology, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| |
Collapse
|
30
|
Cenci U, Sibbald SJ, Curtis BA, Kamikawa R, Eme L, Moog D, Henrissat B, Maréchal E, Chabi M, Djemiel C, Roger AJ, Kim E, Archibald JM. Nuclear genome sequence of the plastid-lacking cryptomonad Goniomonas avonlea provides insights into the evolution of secondary plastids. BMC Biol 2018; 16:137. [PMID: 30482201 PMCID: PMC6260743 DOI: 10.1186/s12915-018-0593-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2018] [Accepted: 10/12/2018] [Indexed: 11/21/2022] Open
Abstract
Background The evolution of photosynthesis has been a major driver in eukaryotic diversification. Eukaryotes have acquired plastids (chloroplasts) either directly via the engulfment and integration of a photosynthetic cyanobacterium (primary endosymbiosis) or indirectly by engulfing a photosynthetic eukaryote (secondary or tertiary endosymbiosis). The timing and frequency of secondary endosymbiosis during eukaryotic evolution is currently unclear but may be resolved in part by studying cryptomonads, a group of single-celled eukaryotes comprised of both photosynthetic and non-photosynthetic species. While cryptomonads such as Guillardia theta harbor a red algal-derived plastid of secondary endosymbiotic origin, members of the sister group Goniomonadea lack plastids. Here, we present the genome of Goniomonas avonlea—the first for any goniomonad—to address whether Goniomonadea are ancestrally non-photosynthetic or whether they lost a plastid secondarily. Results We sequenced the nuclear and mitochondrial genomes of Goniomonas avonlea and carried out a comparative analysis of Go. avonlea, Gu. theta, and other cryptomonads. The Go. avonlea genome assembly is ~ 92 Mbp in size, with 33,470 predicted protein-coding genes. Interestingly, some metabolic pathways (e.g., fatty acid biosynthesis) predicted to occur in the plastid and periplastidal compartment of Gu. theta appear to operate in the cytoplasm of Go. avonlea, suggesting that metabolic redundancies were generated during the course of secondary plastid integration. Other cytosolic pathways found in Go. avonlea are not found in Gu. theta, suggesting secondary loss in Gu. theta and other plastid-bearing cryptomonads. Phylogenetic analyses revealed no evidence for algal endosymbiont-derived genes in the Go. avonlea genome. Phylogenomic analyses point to a specific relationship between Cryptista (to which cryptomonads belong) and Archaeplastida. Conclusion We found no convincing genomic or phylogenomic evidence that Go. avonlea evolved from a secondary red algal plastid-bearing ancestor, consistent with goniomonads being ancestrally non-photosynthetic eukaryotes. The Go. avonlea genome sheds light on the physiology of heterotrophic cryptomonads and serves as an important reference point for studying the metabolic “rewiring” that took place during secondary plastid integration in the ancestor of modern-day Cryptophyceae. Electronic supplementary material The online version of this article (10.1186/s12915-018-0593-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ugo Cenci
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Shannon J Sibbald
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Bruce A Curtis
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Ryoma Kamikawa
- Graduate School of Human and Environmental Studies, Kyoto University, Kyoto, Kyoto, 606-8501, Japan
| | - Laura Eme
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada.,Present address: Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, SE-75123, Uppsala, Sweden
| | - Daniel Moog
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada.,Present address: Laboratory for Cell Biology, Philipps University Marburg, Karl-von-Frisch-Str. 8, 35043, Marburg, Germany
| | - Bernard Henrissat
- Architecture et Fonction des Macromolécules Biologiques (AFMB), CNRS, Université Aix-Marseille, 163 Avenue de Luminy, 13288, Marseille, France.,INRA, USC 1408 AFMB, 13288, Marseille, France.,Department of Biological Sciences, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Eric Maréchal
- Laboratoire de Physiologie Cellulaire et Végétale, CNRS, CEA, INRA, Université Grenoble Alpes, Institut de Biosciences et Biotechnologies de Grenoble, CEA-Grenoble, 17 rue des Martyrs, 38000, Grenoble, France
| | - Malika Chabi
- Present address: UMR 8576 - Unité de glycobiologie structurale et fonctionnelle, Université Lille 1, 59650, Villeneuve d'Ascq, France
| | - Christophe Djemiel
- Present address: UMR 8576 - Unité de glycobiologie structurale et fonctionnelle, Université Lille 1, 59650, Villeneuve d'Ascq, France
| | - Andrew J Roger
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada.,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada.,Canadian Institute for Advanced Research, Program in Integrated Microbial Biodiversity, Toronto, Ontario, Canada
| | - Eunsoo Kim
- Division of Invertebrate Zoology & Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79 Street, New York, NY, 10024, USA
| | - John M Archibald
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, B3H 4R2, Canada. .,Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Nova Scotia, Canada. .,Canadian Institute for Advanced Research, Program in Integrated Microbial Biodiversity, Toronto, Ontario, Canada.
| |
Collapse
|
31
|
Phylogenomics offers resolution of major tunicate relationships. Mol Phylogenet Evol 2018; 121:166-173. [DOI: 10.1016/j.ympev.2018.01.005] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 12/15/2017] [Accepted: 01/08/2018] [Indexed: 02/03/2023]
|
32
|
Fisch-Muller S, Mol JHA, Covain R. An integrative framework to reevaluate the Neotropical catfish genus Guyanancistrus (Siluriformes: Loricariidae) with particular emphasis on the Guyanancistrus brevispinis complex. PLoS One 2018; 13:e0189789. [PMID: 29298344 PMCID: PMC5752014 DOI: 10.1371/journal.pone.0189789] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 12/03/2017] [Indexed: 11/29/2022] Open
Abstract
Characterizing and naming species becomes more and more challenging due to the increasing difficulty of accurately delineating specific bounderies. In this context, integrative taxonomy aims to delimit taxonomic units by leveraging the complementarity of multiple data sources (geography, morphology, genetics, etc.). However, while the theoretical framework of integrative taxonomy has been explicitly stated, methods for the simultaneous analysis of multiple data sets are poorly developed and in many cases different information sources are still explored successively. Multi-table methods developed in the field of community ecology provide such an intregrative framework. In particular, multiple co-inertia analysis is flexible enough to allow the integration of morphological, distributional, and genetic data in the same analysis. We have applied this powerfull approach to delimit species boundaries in a group of poorly differentiated catfishes belonging to the genus Guyanancistrus from the Guianas region of northeastern South America. Because the species G. brevispinis has been claimed to be a species complex consisting of five species, particular attention was paid to taxon. Separate analyses indicated the presence of eight distinct species of Guyanancistrus, including five new species and one new genus. However, none of the preliminary analyses revealed different lineages within G. brevispinis, and the multi-table analysis revealed three intraspecific lineages. After taxonomic clarifications and description of the new genus, species and subspecies, a reappraisal of the biogeography of Guyanancistrus members was performed. This analysis revealed three distinct dispersals from the Upper reaches of Amazonian tributaries toward coastal rivers of the Eastern Guianas Ecoregion. The central role played by the Maroni River, as gateway from the Amazon basin, was confirmed. The Maroni River was also found to be a center of speciation for Guyanancistrus (with three species and two subspecies), as well as a source of dispersal of G. brevispinis toward the other main basins of the Eastern Guianas.
Collapse
Affiliation(s)
- Sonia Fisch-Muller
- Natural History Museum, Department of Herpetology and Ichthyology, Geneva, Switzerland
| | - Jan H. A. Mol
- Center for Agricultural Research in Suriname (CELOS) and Department of Biology, Anton de Kom University of Suriname, Paramaribo, Suriname
| | - Raphaël Covain
- Natural History Museum, Department of Herpetology and Ichthyology, Geneva, Switzerland
- * E-mail:
| |
Collapse
|
33
|
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. The abundance of genomic data for an enormous variety of organisms has enabled phylogenomic inference of many groups, and this has motivated the development of many computer programs implementing the associated methods. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - Joaquim Martins
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - João C Setubal
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil.
| |
Collapse
|
34
|
Weyenberg G, Yoshida R, Howe D. Normalizing Kernels in the Billera-Holmes-Vogtmann Treespace. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1359-1365. [PMID: 28113725 DOI: 10.1109/tcbb.2016.2565475] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
As costs of genome sequencing have dropped precipitously, development of efficient bioinformatic methods to analyze genome structure and evolution have become ever more urgent. For example, most published phylogenomic studies involve either massive concatenation of sequences, or informal comparisons of phylogenies inferred on a small subset of orthologous genes, neither of which provides a comprehensive overview of evolution or systematic identification of genes with unusual and interesting evolution (e.g., horizontal gene transfers, gene duplication, and subsequent neofunctionalization). We are interested in identifying such "outlying" gene trees from the set of gene trees and estimating the distribution of trees over the "tree space". This paper describes an improvement to the kdetrees algorithm, an adaptation of classical kernel density estimation to the metric space of phylogenetic trees (Billera-Holmes-Vogtman treespace), whereby the kernel normalizing constants, are estimated through the use of the novel holonomic gradient methods. As in the original kdetrees paper, we have applied kdetrees to a set of Apicomplexa genes. The analysis identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. The updated version of the kdetrees software package is available both from CRAN (the official R package system), as well as from the official development repository on Github. ( github.com/grady/kdetrees).
Collapse
|
35
|
Derelle R, López-García P, Timpano H, Moreira D. A Phylogenomic Framework to Study the Diversity and Evolution of Stramenopiles (=Heterokonts). Mol Biol Evol 2016; 33:2890-2898. [PMID: 27512113 DOI: 10.1093/molbev/msw168] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Stramenopiles or heterokonts constitute one of the most speciose and diverse clades of protists. It includes ecologically important algae (such as diatoms or large multicellular brown seaweeds), as well as heterotrophic (e.g., bicosoecids, MAST groups) and parasitic (e.g., Blastocystis, oomycetes) species. Despite their evolutionary and ecological relevance, deep phylogenetic relationships among stramenopile groups, inferred mostly from small-subunit rDNA phylogenies, remain unresolved, especially for the heterotrophic taxa. Taking advantage of recently released stramenopile transcriptome and genome sequences, as well as data from the genomic assembly of the MAST-3 species Incisomonas marina generated in our laboratory, we have carried out the first extensive phylogenomic analysis of stramenopiles, including representatives of most major lineages. Our analyses, based on a large data set of 339 widely distributed proteins, strongly support a root of stramenopiles lying between two clades, Bigyra and Gyrista (Pseudofungi plus Ochrophyta). Additionally, our analyses challenge the Phaeista-Khakista dichotomy of photosynthetic stramenopiles (ochrophytes) as two groups previously considered to be part of the Phaeista (Pelagophyceae and Dictyochophyceae), branch with strong support with the Khakista (Bolidophyceae and Diatomeae). We propose a new classification of ochrophytes within the two groups Chrysista and Diatomista to reflect the new phylogenomic results. Our stramenopile phylogeny provides a robust phylogenetic framework to investigate the evolution and diversification of this group of ecologically relevant protists.
Collapse
Affiliation(s)
- Romain Derelle
- Unité d'Ecologie, Systématique et Evolution, Centre National de la Recherche Scientifique (CNRS), Université Paris-Sud/Paris-Saclay, AgroParisTech, Orsay, France
| | - Purificación López-García
- Unité d'Ecologie, Systématique et Evolution, Centre National de la Recherche Scientifique (CNRS), Université Paris-Sud/Paris-Saclay, AgroParisTech, Orsay, France
| | - Hélène Timpano
- Unité d'Ecologie, Systématique et Evolution, Centre National de la Recherche Scientifique (CNRS), Université Paris-Sud/Paris-Saclay, AgroParisTech, Orsay, France
| | - David Moreira
- Unité d'Ecologie, Systématique et Evolution, Centre National de la Recherche Scientifique (CNRS), Université Paris-Sud/Paris-Saclay, AgroParisTech, Orsay, France
| |
Collapse
|
36
|
Greenwood JM, Ezquerra AL, Behrens S, Branca A, Mallet L. Current analysis of host–parasite interactions with a focus on next generation sequencing data. ZOOLOGY 2016; 119:298-306. [DOI: 10.1016/j.zool.2016.06.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Revised: 06/22/2016] [Accepted: 06/22/2016] [Indexed: 01/21/2023]
|
37
|
Layer M, Rhodes JA. Phylogenetic trees and Euclidean embeddings. J Math Biol 2016; 74:99-111. [PMID: 27155875 DOI: 10.1007/s00285-016-1018-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Revised: 01/27/2016] [Indexed: 10/21/2022]
Abstract
It was recently observed by de Vienne et al. (Syst Biol 60(6):826-832, 2011) that a simple square root transformation of distances between taxa on a phylogenetic tree allowed for an embedding of the taxa into Euclidean space. While the justification for this was based on a diffusion model of continuous character evolution along the tree, here we give a direct and elementary explanation for it that provides substantial additional insight. We use this embedding to reinterpret the differences between the NJ and BIONJ tree building algorithms, providing one illustration of how this embedding reflects tree structures in data.
Collapse
Affiliation(s)
- Mark Layer
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA
| | - John A Rhodes
- Department of Mathematics and Statistics, University of Alaska Fairbanks, Fairbanks, AK, 99775, USA.
| |
Collapse
|
38
|
Mengual-Chuliá B, Bedhomme S, Lafforgue G, Elena SF, Bravo IG. Assessing parallel gene histories in viral genomes. BMC Evol Biol 2016; 16:32. [PMID: 26847371 PMCID: PMC4743424 DOI: 10.1186/s12862-016-0605-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 01/29/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The increasing abundance of sequence data has exacerbated a long known problem: gene trees and species trees for the same terminal taxa are often incongruent. Indeed, genes within a genome have not all followed the same evolutionary path due to events such as incomplete lineage sorting, horizontal gene transfer, gene duplication and deletion, or recombination. Considering conflicts between gene trees as an obstacle, numerous methods have been developed to deal with these incongruences and to reconstruct consensus evolutionary histories of species despite the heterogeneity in the history of their genes. However, inconsistencies can also be seen as a source of information about the specific evolutionary processes that have shaped genomes. RESULTS The goal of the approach here proposed is to exploit this conflicting information: we have compiled eleven variables describing phylogenetic relationships and evolutionary pressures and submitted them to dimensionality reduction techniques to identify genes with similar evolutionary histories. To illustrate the applicability of the method, we have chosen two viral datasets, namely papillomaviruses and Turnip mosaic virus (TuMV) isolates, largely dissimilar in genome, evolutionary distance and biology. Our method pinpoints viral genes with common evolutionary patterns. In the case of papillomaviruses, gene clusters match well our knowledge on viral biology and life cycle, illustrating the potential of our approach. For the less known TuMV, our results trigger new hypotheses about viral evolution and gene interaction. CONCLUSIONS The approach here presented allows turning phylogenetic inconsistencies into evolutionary information, detecting gene assemblies with similar histories, and could be a powerful tool for comparative pathogenomics.
Collapse
Affiliation(s)
- Beatriz Mengual-Chuliá
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain.,Bellvitge Institute of Biomedical Research (IDIBELL), Barcelona, Spain
| | - Stéphanie Bedhomme
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain.,Bellvitge Institute of Biomedical Research (IDIBELL), Barcelona, Spain.,Centre d'Ecologie Fonctionnelle et Evolutive, UMR CNRS 5175, Montpellier, France
| | - Guillaume Lafforgue
- Centre d'Ecologie Fonctionnelle et Evolutive, UMR CNRS 5175, Montpellier, France.,Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universidad Politécnica de Valencia, València, Spain
| | - Santiago F Elena
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-Universidad Politécnica de Valencia, València, Spain.,I2SysBio, Consejo Superior de Investigaciones Científicas-Universitat de València, València, Spain.,The Santa Fe Institute, Santa Fe, NM, USA
| | - Ignacio G Bravo
- Infections and Cancer Laboratory, Catalan Institute of Oncology (ICO), Barcelona, Spain. .,MIVEGEC (UMR CNRS 5290, IRD 224, UM), National Center for Scientific Research (CNRS), Montpellier, France. .,National Center for Scientific Research (CNRS), Maladies Infectieuses et Vecteurs: Ecologie, Génétique, Evolution et Contrôle (MIVEGEC), UMR CNRS 5290, IRD 224, UM, 911 Avenue Agropolis, BP 64501, 34394, Montpellier, Cedex 5, France.
| |
Collapse
|
39
|
Simmons MP, Sloan DB, Gatesy J. The effects of subsampling gene trees on coalescent methods applied to ancient divergences. Mol Phylogenet Evol 2016; 97:76-89. [PMID: 26768112 DOI: 10.1016/j.ympev.2015.12.013] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Revised: 12/03/2015] [Accepted: 12/20/2015] [Indexed: 10/22/2022]
Abstract
Gene-tree-estimation error is a major concern for coalescent methods of phylogenetic inference. We sampled eight empirical studies of ancient lineages with diverse numbers of taxa and genes for which the original authors applied one or more coalescent methods. We found that the average pairwise congruence among gene trees varied greatly both between studies and also often within a study. We recommend that presenting plots of pairwise congruence among gene trees in a dataset be treated as a standard practice for empirical coalescent studies so that readers can readily assess the extent and distribution of incongruence among gene trees. ASTRAL-based coalescent analyses generally outperformed MP-EST and STAR with respect to both internal consistency (congruence between analyses of subsamples of genes with the complete dataset of all genes) and congruence with the concatenation-based topology. We evaluated the approach of subsampling gene trees that are, on average, more congruent with other gene trees as a method to reduce artifacts caused by gene-tree-estimation errors on coalescent analyses. We suggest that this method is well suited to testing whether gene-tree-estimation error is a primary cause of incongruence between concatenation- and coalescent-based results, to reconciling conflicting phylogenetic results based on different coalescent methods, and to identifying genes affected by artifacts that may then be targeted for reciprocal illumination. We provide scripts that automate the process of calculating pairwise gene-tree incongruence and subsampling trees while accounting for differential taxon sampling among genes. Finally, we assert that multiple tree-search replicates should be implemented as a standard practice for empirical coalescent studies that apply MP-EST.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - John Gatesy
- Department of Biology, University of California, Riverside, CA 92521, USA
| |
Collapse
|
40
|
Murray GGR, Weinert LA, Rhule EL, Welch JJ. The Phylogeny of Rickettsia Using Different Evolutionary Signatures: How Tree-Like is Bacterial Evolution? Syst Biol 2015; 65:265-79. [PMID: 26559010 PMCID: PMC4748751 DOI: 10.1093/sysbio/syv084] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 11/04/2015] [Indexed: 11/14/2022] Open
Abstract
Rickettsia is a genus of intracellular bacteria whose hosts and transmission strategies are both impressively diverse, and this is reflected in a highly dynamic genome. Some previous studies have described the evolutionary history of Rickettsia as non-tree-like, due to incongruity between phylogenetic reconstructions using different portions of the genome. Here, we reconstruct the Rickettsia phylogeny using whole-genome data, including two new genomes from previously unsampled host groups. We find that a single topology, which is supported by multiple sources of phylogenetic signal, well describes the evolutionary history of the core genome. We do observe extensive incongruence between individual gene trees, but analyses of simulations over a single topology and interspersed partitions of sites show that this is more plausibly attributed to systematic error than to horizontal gene transfer. Some conflicting placements also result from phylogenetic analyses of accessory genome content (i.e., gene presence/absence), but we argue that these are also due to systematic error, stemming from convergent genome reduction, which cannot be accommodated by existing phylogenetic methods. Our results show that, even within a single genus, tests for gene exchange based on phylogenetic incongruence may be susceptible to false positives.
Collapse
Affiliation(s)
- Gemma G R Murray
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK; and
| | - Lucy A Weinert
- Department of Veterinary Medicine, University of Cambridge, Madingley Road, Cambridge CB3 0ES, UK
| | - Emma L Rhule
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK; and
| | - John J Welch
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK; and
| |
Collapse
|
41
|
Pérez-Escobar OA, Balbuena JA, Gottschling M. Rumbling Orchids: How To Assess Divergent Evolution Between Chloroplast Endosymbionts and the Nuclear Host. Syst Biol 2015; 65:51-65. [DOI: 10.1093/sysbio/syv070] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Accepted: 09/15/2015] [Indexed: 01/17/2023] Open
|
42
|
Abstract
The endosymbiotic origin of plastids from cyanobacteria was a landmark event in the history of eukaryotic life. Subsequent to the evolution of primary plastids, photosynthesis spread from red and green algae to unrelated eukaryotes by secondary and tertiary endosymbiosis. Although the movement of cyanobacterial genes from endosymbiont to host is well studied, less is known about the migration of eukaryotic genes from one nucleus to the other in the context of serial endosymbiosis. Here I explore the magnitude and potential impact of nucleus-to-nucleus endosymbiotic gene transfer in the evolution of complex algae, and the extent to which such transfers compromise our ability to infer the deep structure of the eukaryotic tree of life. In addition to endosymbiotic gene transfer, horizontal gene transfer events occurring before, during, and after endosymbioses further confound our efforts to reconstruct the ancient mergers that forged multiple lines of photosynthetic microbial eukaryotes.
Collapse
|
43
|
Xu X, Dunn KA, Field C. A Robust ANOVA Approach to Estimating a Phylogeny from Multiple Genes. Mol Biol Evol 2015; 32:2186-94. [PMID: 25841490 DOI: 10.1093/molbev/msv084] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
In this article, we address the issue of estimating the phylogenetic tree based on sequence data across a set of genes. Recognizing that the individual gene trees may not all share the same evolutionary history due to lateral gene transfer or differences in rates of evolution for instance, we develop a robust algorithm for tree estimation based on pairwise distances computed gene by gene. A robust analysis of variance (ANOVA) is used to combine the distances across all genes giving a summary distance for all genes. The tree can then be constructed using any distance method such as BIONJ. Using the weights from the robust ANOVA, we can then identify the outlying genes and taxa for further examination. As the method is based on distances, computation is much faster than maximum likelihood on the concatenated genes. It is also very straightforward to carry out a bootstrap analysis using standard methods for regression models. We test our methods in a comprehensive simulation study and apply them to three data sets recently analyzed in the literature.
Collapse
Affiliation(s)
- Ximing Xu
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - Katherine A Dunn
- Department of Biology, Dalhousie University, Halifax, NS, Canada
| | - Chris Field
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| |
Collapse
|
44
|
Abstract
The large phylogenetic distance separating eukaryotic genes and their archaeal orthologs has prevented identification of the position of the eukaryotic root in phylogenomic studies. Recently, an innovative approach has been proposed to circumvent this issue: the use as phylogenetic markers of proteins that have been transferred from bacterial donor sources to eukaryotes, after their emergence from Archaea. Using this approach, two recent independent studies have built phylogenomic datasets based on bacterial sequences, leading to different predictions of the eukaryotic root. Taking advantage of additional genome sequences from the jakobid Andalucia godoyi and the two known malawimonad species (Malawimonas jakobiformis and Malawimonas californiana), we reanalyzed these two phylogenomic datasets. We show that both datasets pinpoint the same phylogenetic position of the eukaryotic root that is between "Unikonta" and "Bikonta," with malawimonad and collodictyonid lineages on the Unikonta side of the root. Our results firmly indicate that (i) the supergroup Excavata is not monophyletic and (ii) the last common ancestor of eukaryotes was a biflagellate organism. Based on our results, we propose to rename the two major eukaryotic groups Unikonta and Bikonta as Opimoda and Diphoda, respectively.
Collapse
|
45
|
Foley NM, Thong VD, Soisook P, Goodman SM, Armstrong KN, Jacobs DS, Puechmaille SJ, Teeling EC. How and why overcome the impediments to resolution: lessons from rhinolophid and hipposiderid bats. Mol Biol Evol 2014; 32:313-33. [PMID: 25433366 PMCID: PMC4769323 DOI: 10.1093/molbev/msu329] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The phylogenetic and taxonomic relationships among the Old World leaf-nosed bats (Hipposideridae) and the closely related horseshoe bats (Rhinolophidae) remain unresolved. In this study, we generated a novel approximately 10-kb molecular data set of 19 nuclear exon and intron gene fragments for 40 bat species to elucidate the phylogenetic relationships within the families Rhinolophidae and Hipposideridae. We estimated divergence times and explored potential reasons for any incongruent phylogenetic signal. We demonstrated the effects of outlier taxa and genes on phylogenetic reconstructions and compared the relative performance of intron and exon data to resolve phylogenetic relationships. Phylogenetic analyses produced a well-resolved phylogeny, supporting the familial status of Hipposideridae and demonstrated the paraphyly of the largest genus, Hipposideros. A fossil-calibrated timetree and biogeographical analyses estimated that Rhinolophidae and Hipposideridae diverged in Africa during the Eocene approximately 42 Ma. The phylogram, the timetree, and a unique retrotransposon insertion supported the elevation of the subtribe Rhinonycterina to family level and which is diagnosed herein. Comparative analysis of diversification rates showed that the speciose genera Rhinolophus and Hipposideros underwent diversification during the Mid-Miocene Climatic Optimum. The intron versus exon analyses demonstrated the improved nodal support provided by introns for our optimal tree, an important finding for large-scale phylogenomic studies, which typically rely on exon data alone. With the recent outbreak of Middle East respiratory syndrome, caused by a novel coronavirus, the study of these species is urgent as they are considered the natural reservoir for emergent severe acute respiratory syndrome (SARS)-like coronaviruses. It has been shown that host phylogeny is the primary factor that determines a virus’s persistence, replicative ability, and can act as a predictor of new emerging disease. Therefore, this newly resolved phylogeny can be used to direct future assessments of viral diversity and to elucidate the origin and development of SARS-like coronaviruses in mammals.
Collapse
Affiliation(s)
- Nicole M Foley
- School of Biology & Environmental Science, University College Dublin, Belfield, Dublin, Ireland
| | - Vu Dinh Thong
- Institute of Ecology and Biological Resources, Vietnam Academy of Science and Technology, Hanoi, Vietnam
| | - Pipat Soisook
- Princess Maha Chakri Sirindhorn Natural History Museum, Prince of Songkla University, Hat Yai, Songkhla, Thailand
| | - Steven M Goodman
- Field Museum of Natural History, Chicago, IL, USA Association Vahatra, Antananarivo, Madagascar
| | - Kyle N Armstrong
- Australian Centre for Evolutionary Biology & Biodiversity, The University of Adelaide, Adelaide, South Australia, Australia South Australian Museum, Adelaide, South Australia, Australia
| | - David S Jacobs
- Department of Biological Sciences, University of Cape Town, Rondebosch, South Africa
| | - Sébastien J Puechmaille
- School of Biology & Environmental Science, University College Dublin, Belfield, Dublin, Ireland Zoological Institute and Museum, Greifswald University, Greifswald, Germany
| | - Emma C Teeling
- School of Biology & Environmental Science, University College Dublin, Belfield, Dublin, Ireland
| |
Collapse
|
46
|
Schrödl M, Stöger I. A review on deep molluscan phylogeny: old markers, integrative approaches, persistent problems. J NAT HIST 2014. [DOI: 10.1080/00222933.2014.963184] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
47
|
Boussau B, Walton Z, Delgado JA, Collantes F, Beani L, Stewart IJ, Cameron SA, Whitfield JB, Johnston JS, Holland PW, Bachtrog D, Kathirithamby J, Huelsenbeck JP. Strepsiptera, phylogenomics and the long branch attraction problem. PLoS One 2014; 9:e107709. [PMID: 25272037 PMCID: PMC4182670 DOI: 10.1371/journal.pone.0107709] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Accepted: 08/14/2014] [Indexed: 11/18/2022] Open
Abstract
Insect phylogeny has recently been the focus of renewed interest as advances in sequencing techniques make it possible to rapidly generate large amounts of genomic or transcriptomic data for a species of interest. However, large numbers of markers are not sufficient to guarantee accurate phylogenetic reconstruction, and the choice of the model of sequence evolution as well as adequate taxonomic sampling are as important for phylogenomic studies as they are for single-gene phylogenies. Recently, the sequence of the genome of a strepsipteran has been published and used to place Strepsiptera as sister group to Coleoptera. However, this conclusion relied on a data set that did not include representatives of Neuropterida or of coleopteran lineages formerly proposed to be related to Strepsiptera. Furthermore, it did not use models that are robust against the long branch attraction artifact. Here we have sequenced the transcriptomes of seven key species to complete a data set comprising 36 species to study the higher level phylogeny of insects, with a particular focus on Neuropteroidea (Coleoptera, Strepsiptera, Neuropterida), especially on coleopteran taxa considered as potential close relatives of Strepsiptera. Using models robust against the long branch attraction artifact we find a highly resolved phylogeny that confirms the position of Strepsiptera as a sister group to Coleoptera, rather than as an internal clade of Coleoptera, and sheds new light onto the phylogeny of Neuropteroidea.
Collapse
Affiliation(s)
- Bastien Boussau
- Department of Integrative Biology, University of California, Berkeley, CA, United States of America
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, Université de Lyon, Villeurbanne, France
| | - Zaak Walton
- Department of Integrative Biology, University of California, Berkeley, CA, United States of America
| | - Juan A. Delgado
- Departamento de Zoologia y Antropologia Fisica, Facultad de Biologia, Universidad de Murcia, Murcia, Spain
| | - Francisco Collantes
- Departamento de Zoologia y Antropologia Fisica, Facultad de Biologia, Universidad de Murcia, Murcia, Spain
| | - Laura Beani
- Dipartimento di Biologia, Università di Firenze, Sesto Fiorentino, Firenze, Italia
| | - Isaac J. Stewart
- Fisher High School, Fisher, IL, United States of America
- Department of Entomology, University of Illinois, Urbana, IL, United States of America
| | - Sydney A. Cameron
- Department of Entomology, University of Illinois, Urbana, IL, United States of America
| | - James B. Whitfield
- Department of Entomology, University of Illinois, Urbana, IL, United States of America
| | - J. Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, TX, United States of America
| | - Peter W.H. Holland
- Department of Zoology, University of Oxford, Oxford, England, United Kingdom
| | - Doris Bachtrog
- Department of Integrative Biology, University of California, Berkeley, CA, United States of America
| | | | - John P. Huelsenbeck
- Department of Integrative Biology, University of California, Berkeley, CA, United States of America
- Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
48
|
Weyenberg G, Huggins PM, Schardl CL, Howe DK, Yoshida R. kdetrees: Non-parametric estimation of phylogenetic tree distributions. Bioinformatics 2014; 30:2280-7. [PMID: 24764459 PMCID: PMC4176058 DOI: 10.1093/bioinformatics/btu258] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 04/04/2014] [Accepted: 04/22/2014] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION Although the majority of gene histories found in a clade of organisms are expected to be generated by a common process (e.g. the coalescent process), it is well known that numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history distinct from those of the majority of genes. Such 'outlying' gene trees are considered to be biologically interesting, and identifying these genes has become an important problem in phylogenetics. RESULTS We propose and implement kdetrees, a non-parametric method for estimating distributions of phylogenetic trees, with the goal of identifying trees that are significantly different from the rest of the trees in the sample. Our method compares favorably with a similar recently published method, featuring an improvement of one polynomial order of computational complexity (to quadratic in the number of trees analyzed), with simulation studies suggesting only a small penalty to classification accuracy. Application of kdetrees to a set of Apicomplexa genes identified several unreliable sequence alignments that had escaped previous detection, as well as a gene independently reported as a possible case of horizontal gene transfer. We also analyze a set of Epichloë genes, fungi symbiotic with grasses, successfully identifying a contrived instance of paralogy. AVAILABILITY AND IMPLEMENTATION Our method for estimating tree distributions and identifying outlying trees is implemented as the R package kdetrees and is available for download from CRAN.
Collapse
Affiliation(s)
- Grady Weyenberg
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Peter M Huggins
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Christopher L Schardl
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Daniel K Howe
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| | - Ruriko Yoshida
- Department of Statistics, University of Kentucky, Lexington, KY 40536, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, Plant Pathology Department and Department of Veterinary Science, University of Kentucky, Lexington, KY 40546, USA
| |
Collapse
|
49
|
Hernández-López A, Chabrol O, Royer-Carenzi M, Merhej V, Pontarotti P, Raoult D. To tree or not to tree? Genome-wide quantification of recombination and reticulate evolution during the diversification of strict intracellular bacteria. Genome Biol Evol 2014; 5:2305-17. [PMID: 24259310 PMCID: PMC3879967 DOI: 10.1093/gbe/evt178] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
It is well known that horizontal gene transfer (HGT) is a major force in the evolution of prokaryotes. During the adaptation of a bacterial population to a new ecological niche, and particularly for intracellular bacteria, selective pressures are shifted and ecological niches reduced, resulting in a lower rate of genetic connectivity. HGT and positive selection are therefore two important evolutionary forces in microbial pathogens that drive adaptation to new hosts. In this study, we use genomic distance analyses, phylogenomic networks, tree topology comparisons, and Bayesian inference methods to investigate to what extent HGT has occurred during the evolution of the genus Rickettsia, the effect of the use of different genomic regions in estimating reticulate evolution and HGT events, and the link of these to host range. We show that ecological specialization restricts recombination occurrence in Rickettsia, but other evolutionary processes and genome architecture are also important for the occurrence of HGT. We found that recombination, genomic rearrangements, and genome conservation all show evidence of network-like evolution at whole-genome scale. We show that reticulation occurred mainly, but not only, during the early Rickettsia radiation, and that core proteome genes of every major functional category have experienced reticulated evolution and possibly HGT. Overall, the evolution of Rickettsia bacteria has been tree-like, with evidence of HGT and reticulated evolution for around 10–25% of the core Rickettsia genome. We present evidence of extensive recombination/incomplete lineage sorting (ILS) during the radiation of the genus, probably linked with the emergence of intracellularity in a wide range of hosts.
Collapse
Affiliation(s)
- Antonio Hernández-López
- Aix-Marseille Université, LATP UMR - CNRS 7353, Evolution Biologique et Modélisation, Marseille, France
| | | | | | | | | | | |
Collapse
|
50
|
Montelongo T, Gómez-Zurita J. Multilocus molecular systematics and evolution in time and space ofCalligrapha(Coleoptera: Chrysomelidae, Chrysomelinae). ZOOL SCR 2014. [DOI: 10.1111/zsc.12073] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Tinguaro Montelongo
- Institut de Biologia Evolutiva (CSIC-University Pompeu Fabra); Pg. Marítim de la Barceloneta 37 08003 Barcelona Spain
| | - Jesús Gómez-Zurita
- Institut de Biologia Evolutiva (CSIC-University Pompeu Fabra); Pg. Marítim de la Barceloneta 37 08003 Barcelona Spain
| |
Collapse
|