1
|
Bernot JP, Owen CL, Wolfe JM, Meland K, Olesen J, Crandall KA. Major Revisions in Pancrustacean Phylogeny and Evidence of Sensitivity to Taxon Sampling. Mol Biol Evol 2023; 40:msad175. [PMID: 37552897 PMCID: PMC10414812 DOI: 10.1093/molbev/msad175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 06/14/2023] [Accepted: 06/19/2023] [Indexed: 08/10/2023] Open
Abstract
The clade Pancrustacea, comprising crustaceans and hexapods, is the most diverse group of animals on earth, containing over 80% of animal species and half of animal biomass. It has been the subject of several recent phylogenomic analyses, yet relationships within Pancrustacea show a notable lack of stability. Here, the phylogeny is estimated with expanded taxon sampling, particularly of malacostracans. We show small changes in taxon sampling have large impacts on phylogenetic estimation. By analyzing identical orthologs between two slightly different taxon sets, we show that the differences in the resulting topologies are due primarily to the effects of taxon sampling on the phylogenetic reconstruction method. We compare trees resulting from our phylogenomic analyses with those from the literature to explore the large tree space of pancrustacean phylogenetic hypotheses and find that statistical topology tests reject the previously published trees in favor of the maximum likelihood trees produced here. Our results reject several clades including Caridoida, Eucarida, Multicrustacea, Vericrustacea, and Syncarida. Notably, we find Copepoda nested within Allotriocarida with high support and recover a novel relationship between decapods, euphausiids, and syncarids that we refer to as the Syneucarida. With denser taxon sampling, we find Stomatopoda sister to this latter clade, which we collectively name Stomatocarida, dividing Malacostraca into three clades: Leptostraca, Peracarida, and Stomatocarida. A new Bayesian divergence time estimation is conducted using 13 vetted fossils. We review our results in the context of other pancrustacean phylogenetic hypotheses and highlight 15 key taxa to sample in future studies.
Collapse
Affiliation(s)
- James P Bernot
- Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Christopher L Owen
- Systematic Entomology Laboratory, USDA-ARS, ℅ National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Joanna M Wolfe
- Museum of Comparative Zoology and Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Kenneth Meland
- Department of Biology, University of Bergen, Bergen, Norway
| | - Jørgen Olesen
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Keith A Crandall
- Department of Invertebrate Zoology, US National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA
| |
Collapse
|
2
|
Czech L, Stamatakis A, Dunthorn M, Barbera P. Metagenomic Analysis Using Phylogenetic Placement-A Review of the First Decade. FRONTIERS IN BIOINFORMATICS 2022; 2:871393. [PMID: 36304302 PMCID: PMC9580882 DOI: 10.3389/fbinf.2022.871393] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/11/2022] [Indexed: 12/20/2022] Open
Abstract
Phylogenetic placement refers to a family of tools and methods to analyze, visualize, and interpret the tsunami of metagenomic sequencing data generated by high-throughput sequencing. Compared to alternative (e. g., similarity-based) methods, it puts metabarcoding sequences into a phylogenetic context using a set of known reference sequences and taking evolutionary history into account. Thereby, one can increase the accuracy of metagenomic surveys and eliminate the requirement for having exact or close matches with existing sequence databases. Phylogenetic placement constitutes a valuable analysis tool per se, but also entails a plethora of downstream tools to interpret its results. A common use case is to analyze species communities obtained from metagenomic sequencing, for example via taxonomic assignment, diversity quantification, sample comparison, and identification of correlations with environmental variables. In this review, we provide an overview over the methods developed during the first 10 years. In particular, the goals of this review are 1) to motivate the usage of phylogenetic placement and illustrate some of its use cases, 2) to outline the full workflow, from raw sequences to publishable figures, including best practices, 3) to introduce the most common tools and methods and their capabilities, 4) to point out common placement pitfalls and misconceptions, 5) to showcase typical placement-based analyses, and how they can help to analyze, visualize, and interpret phylogenetic placement data.
Collapse
Affiliation(s)
- Lucas Czech
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, United States
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Micah Dunthorn
- Natural History Museum, University of Oslo, Oslo, Norway
| | | |
Collapse
|
3
|
Carneiro de Melo Moura C, Setyaningsih CA, Li K, Merk MS, Schulze S, Raffiudin R, Grass I, Behling H, Tscharntke T, Westphal C, Gailing O. Biomonitoring via DNA metabarcoding and light microscopy of bee pollen in rainforest transformation landscapes of Sumatra. BMC Ecol Evol 2022; 22:51. [PMID: 35473550 PMCID: PMC9040256 DOI: 10.1186/s12862-022-02004-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Accepted: 04/07/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Intense conversion of tropical forests into agricultural systems contributes to habitat loss and the decline of ecosystem functions. Plant-pollinator interactions buffer the process of forest fragmentation, ensuring gene flow across isolated patches of forests by pollen transfer. In this study, we identified the composition of pollen grains stored in pot-pollen of stingless bees, Tetragonula laeviceps, via dual-locus DNA metabarcoding (ITS2 and rbcL) and light microscopy, and compared the taxonomic coverage of pollen sampled in distinct land-use systems categorized in four levels of management intensity (forest, shrub, rubber, and oil palm) for landscape characterization. RESULTS Plant composition differed significantly between DNA metabarcoding and light microscopy. The overlap in the plant families identified via light microscopy and DNA metabarcoding techniques was low and ranged from 22.6 to 27.8%. Taxonomic assignments showed a dominance of pollen from bee-pollinated plants, including oil-bearing crops such as the introduced species Elaeis guineensis (Arecaceae) as one of the predominant taxa in the pollen samples across all four land-use types. Native plant families Moraceae, Euphorbiaceae, and Cannabaceae appeared in high proportion in the analyzed pollen material. One-way ANOVA (p > 0.05), PERMANOVA (R² values range from 0.14003 to 0.17684, for all tests p-value > 0.5), and NMDS (stress values ranging from 0.1515 to 0.1859) indicated a lack of differentiation between the species composition and diversity of pollen type in the four distinct land-use types, supporting the influx of pollen from adjacent areas. CONCLUSIONS Stingless bees collected pollen from a variety of agricultural crops, weeds, and wild plants. Plant composition detected at the family level from the pollen samples likely reflects the plant composition at the landscape level rather than the plot level. In our study, the plant diversity in pollen from colonies installed in land-use systems with distinct levels of forest transformation was highly homogeneous, reflecting a large influx of pollen transported by stingless bees through distinct land-use types. Dual-locus approach applied in metabarcoding studies and visual pollen identification showed great differences in the detection of the plant community, therefore a combination of both methods is recommended for performing biodiversity assessments via pollen identification.
Collapse
Affiliation(s)
| | - Christina A Setyaningsih
- Department of Palynology and Climate Dynamics, Albrecht-von-Haller-Institute for Plant Sciences, University of Göttingen, 37073, Göttingen, Germany
| | - Kevin Li
- Agroecology, Department of Crop Sciences, University of Göttingen, Grisebachstrasse 6, 37077, Göttingen, Germany
| | - Miryam Sarah Merk
- Statistics and Econometrics, University of Göttingen, Göttingen, Germany
| | - Sonja Schulze
- Agroecology, Department of Crop Sciences, University of Göttingen, Grisebachstrasse 6, 37077, Göttingen, Germany
| | - Rika Raffiudin
- Department of Biology, IPB University ID, Bogor, West Java, 16880, Indonesia
| | - Ingo Grass
- Department of Ecology of Tropical Agricultural Systems, University of Hohenheim, 70599, Stuttgart, Germany
| | - Hermann Behling
- Department of Palynology and Climate Dynamics, Albrecht-von-Haller-Institute for Plant Sciences, University of Göttingen, 37073, Göttingen, Germany
| | - Teja Tscharntke
- Agroecology, Department of Crop Sciences, University of Göttingen, Grisebachstrasse 6, 37077, Göttingen, Germany
| | - Catrin Westphal
- Functional Agrobiodiversity, Department of Crop Sciences, University of Göttingen, Grisebachstrasse 6, 37077, Göttingen, Germany
| | - Oliver Gailing
- Department of Forest Genetics and Forest Tree Breeding, University of Göttingen, 37077, Göttingen, Germany. .,Centre of Biodiversity and Sustainable Land Use, University of Göttingen, 37077, Göttingen, Germany.
| |
Collapse
|
4
|
Growth of Biological Complexity from Prokaryotes to Hominids Reflected in the Human Genome. Int J Mol Sci 2021; 22:ijms222111640. [PMID: 34769071 PMCID: PMC8583824 DOI: 10.3390/ijms222111640] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/20/2021] [Accepted: 10/25/2021] [Indexed: 12/12/2022] Open
Abstract
The growth of complexity in evolution is a most intriguing phenomenon. Using gene phylostratigraphy, we showed this growth (as reflected in regulatory mechanisms) in the human genome, tracing the path from prokaryotes to hominids. Generally, the different regulatory gene families expanded at different times, yet only up to the Euteleostomi (bony vertebrates). The only exception was the expansion of transcription factors (TF) in placentals; however, we argue that this was not related to increase in general complexity. Surprisingly, although TF originated in the Prokaryota while chromatin appeared only in the Eukaryota, the expansion of epigenetic factors predated the expansion of TF. Signaling receptors, tumor suppressors, oncogenes, and aging- and disease-associated genes (indicating vulnerabilities in terms of complex organization and strongly enrichment in regulatory genes) also expanded only up to the Euteleostomi. The complexity-related gene properties (protein size, number of alternative splicing mRNA, length of untranslated mRNA, number of biological processes per gene, number of disordered regions in a protein, and density of TF–TF interactions) rose in multicellular organisms and declined after the Euteleostomi, and possibly earlier. At the same time, the speed of protein sequence evolution sharply increased in the genes that originated after the Euteleostomi. Thus, several lines of evidence indicate that molecular mechanisms of complexity growth were changing with time, and in the phyletic lineage leading to humans, the most salient shift occurred after the basic vertebrate body plan was fixed with bony skeleton. The obtained results can be useful for evolutionary medicine.
Collapse
|
5
|
Schön ME, Eme L, Ettema TJG. PhyloMagnet: fast and accurate screening of short-read meta-omics data using gene-centric phylogenetics. Bioinformatics 2020; 36:1718-1724. [PMID: 31647547 PMCID: PMC7703773 DOI: 10.1093/bioinformatics/btz799] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 09/20/2019] [Accepted: 10/23/2019] [Indexed: 11/18/2022] Open
Abstract
Motivation Metagenomic and metatranscriptomic sequencing have become increasingly popular tools for producing massive amounts of short-read data, often used for the reconstruction of draft genomes or the detection of (active) genes in microbial communities. Unfortunately, sequence assemblies of such datasets generally remain a computationally challenging task. Frequently, researchers are only interested in a specific group of organisms or genes; yet, the assembly of multiple datasets only to identify candidate sequences for a specific question is sometimes prohibitively slow, forcing researchers to select a subset of available datasets to address their question. Here, we present PhyloMagnet, a workflow to screen meta-omics datasets for taxa and genes of interest using gene-centric assembly and phylogenetic placement of sequences. Results Using PhyloMagnet, we could identify up to 87% of the genera in an in vitro mock community with variable abundances, while the false positive predictions per single gene tree ranged from 0 to 23%. When applied to a group of metagenomes for which a set of metagenome assembled genomes (MAGs) have been published, we could detect the majority of the taxonomic labels that the MAGs had been annotated with. In a metatranscriptomic setting, the phylogenetic placement of assembled contigs corresponds to that of transcripts obtained from transcriptome assembly. Availability and implementation PhyloMagnet is built using Nextflow, available at github.com/maxemil/PhyloMagnet and is developed and tested on Linux. It is released under the open source GNU GPL licence and documentation is available at phylomagnet.readthedocs.io. Version 0.5 of PhyloMagnet was used for all benchmarking experiments. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Max E Schön
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, SE 75123, Sweden
| | - Laura Eme
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, SE 75123, Sweden.,Ecology, Systematics and Evolution, CNRS, Paris-Sud University, 91400 Orsay, France
| | - Thijs J G Ettema
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, SE 75123, Sweden.,Laboratory of Microbiology, Department of Agrotechnology and Food Sciences, Wageningen University, Wageningen 6708 WE, The Netherlands
| |
Collapse
|
6
|
Jain A, Kihara D. Phylo-PFP: improved automated protein function prediction using phylogenetic distance of distantly related sequences. Bioinformatics 2019; 35:753-759. [PMID: 30165572 DOI: 10.1093/bioinformatics/bty704] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 07/30/2018] [Accepted: 08/23/2018] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Function annotation of proteins is fundamental in contemporary biology across fields including genomics, molecular biology, biochemistry, systems biology and bioinformatics. Function prediction is indispensable in providing clues for interpreting omics-scale data as well as in assisting biologists to build hypotheses for designing experiments. As sequencing genomes is now routine due to the rapid advancement of sequencing technologies, computational protein function prediction methods have become increasingly important. A conventional method of annotating a protein sequence is to transfer functions from top hits of a homology search; however, this approach has substantial short comings including a low coverage in genome annotation. RESULTS Here we have developed Phylo-PFP, a new sequence-based protein function prediction method, which mines functional information from a broad range of similar sequences, including those with a low sequence similarity identified by a PSI-BLAST search. To evaluate functional similarity between identified sequences and the query protein more accurately, Phylo-PFP reranks retrieved sequences by considering their phylogenetic distance. Compared to the Phylo-PFP's predecessor, PFP, which was among the top ranked methods in the second round of the Critical Assessment of Functional Annotation (CAFA2), Phylo-PFP demonstrated substantial improvement in prediction accuracy. Phylo-PFP was further shown to outperform prediction programs to date that were ranked top in CAFA2. AVAILABILITY AND IMPLEMENTATION Phylo-PFP web server is available for at http://kiharalab.org/phylo_pfp.php. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.,Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
7
|
Inoue J, Nakashima K, Satoh N. ORTHOSCOPE Analysis Reveals the Presence of the Cellulose Synthase Gene in All Tunicate Genomes but Not in Other Animal Genomes. Genes (Basel) 2019; 10:genes10040294. [PMID: 30974905 PMCID: PMC6523144 DOI: 10.3390/genes10040294] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 04/03/2019] [Accepted: 04/05/2019] [Indexed: 01/08/2023] Open
Abstract
Tunicates or urochordates—comprising ascidians, larvaceans, and salps—are the only metazoans that can synthesize cellulose, a biological function usually associated with bacteria and plants but not animals. Tunicate cellulose or tunicine is a major component of the outer acellular coverage (tunic) of the entire body of these organisms. Previous studies have suggested that the prokaryotic cellulose synthase gene (CesA) was horizontally transferred into the genome of a tunicate ancestor. However, no convenient tools have been devised to determine whether only tunicates harbor CesA. ORTHOSCOPE is a recently developed tool used to identify orthologous genes and to examine the phylogenic relationship of molecules within major metazoan taxa. The present analysis with this tool revealed the presence of CesA orthologs in all sequenced tunicate genomes but an absence in other metazoan genomes. This supports an evolutionary origin of animal cellulose and provides insights into the evolution of this animal taxon.
Collapse
Affiliation(s)
- Jun Inoue
- Marine Genomics Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904-0495, Japan.
| | - Keisuke Nakashima
- Marine Genomics Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904-0495, Japan.
| | - Noriyuki Satoh
- Marine Genomics Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904-0495, Japan.
| |
Collapse
|
8
|
Casola C. From De Novo to "De Nono": The Majority of Novel Protein-Coding Genes Identified with Phylostratigraphy Are Old Genes or Recent Duplicates. Genome Biol Evol 2018; 10:2906-2918. [PMID: 30346517 PMCID: PMC6239577 DOI: 10.1093/gbe/evy231] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2018] [Indexed: 12/11/2022] Open
Abstract
The evolution of novel protein-coding genes from noncoding regions of the genome is one of the most compelling pieces of evidence for genetic innovations in nature. One popular approach to identify de novo genes is phylostratigraphy, which consists of determining the approximate time of origin (age) of a gene based on its distribution along a species phylogeny. Several studies have revealed significant flaws in determining the age of genes, including de novo genes, using phylostratigraphy alone. However, the rate of false positives in de novo gene surveys, based on phylostratigraphy, remains unknown. Here, I reanalyze the findings from three studies, two of which identified tens to hundreds of rodent-specific de novo genes adopting a phylostratigraphy-centered approach. Most putative de novo genes discovered in these investigations are no longer included in recently updated mouse gene sets. Using a combination of synteny information and sequence similarity searches, I show that ∼60% of the remaining 381 putative de novo genes share homology with genes from other vertebrates, originated through gene duplication, and/or share no synteny information with nonrodent mammals. These results led to an estimated rate of ∼12 de novo genes per million years in mouse. Contrary to a previous study (Wilson BA, Foy SG, Neme R, Masel J. 2017. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol. 1:0146), I found no evidence supporting the preadaptation hypothesis of de novo gene formation. Nearly half of the de novo genes confirmed in this study are within older genes, indicating that co-option of preexisting regulatory regions and a higher GC content may facilitate the origin of novel genes.
Collapse
Affiliation(s)
- Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University
| |
Collapse
|
9
|
Walker JF, Yang Y, Feng T, Timoneda A, Mikenas J, Hutchison V, Edwards C, Wang N, Ahluwalia S, Olivieri J, Walker-Hale N, Majure LC, Puente R, Kadereit G, Lauterbach M, Eggli U, Flores-Olvera H, Ochoterena H, Brockington SF, Moore MJ, Smith SA. From cacti to carnivores: Improved phylotranscriptomic sampling and hierarchical homology inference provide further insight into the evolution of Caryophyllales. AMERICAN JOURNAL OF BOTANY 2018; 105:446-462. [PMID: 29738076 DOI: 10.1002/ajb2.1069] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 01/04/2018] [Indexed: 05/27/2023]
Abstract
PREMISE OF THE STUDY The Caryophyllales contain ~12,500 species and are known for their cosmopolitan distribution, convergence of trait evolution, and extreme adaptations. Some relationships within the Caryophyllales, like those of many large plant clades, remain unclear, and phylogenetic studies often recover alternative hypotheses. We explore the utility of broad and dense transcriptome sampling across the order for resolving evolutionary relationships in Caryophyllales. METHODS We generated 84 transcriptomes and combined these with 224 publicly available transcriptomes to perform a phylogenomic analysis of Caryophyllales. To overcome the computational challenge of ortholog detection in such a large data set, we developed an approach for clustering gene families that allowed us to analyze >300 transcriptomes and genomes. We then inferred the species relationships using multiple methods and performed gene-tree conflict analyses. KEY RESULTS Our phylogenetic analyses resolved many clades with strong support, but also showed significant gene-tree discordance. This discordance is not only a common feature of phylogenomic studies, but also represents an opportunity to understand processes that have structured phylogenies. We also found taxon sampling influences species-tree inference, highlighting the importance of more focused studies with additional taxon sampling. CONCLUSIONS Transcriptomes are useful both for species-tree inference and for uncovering evolutionary complexity within lineages. Through analyses of gene-tree conflict and multiple methods of species-tree inference, we demonstrate that phylogenomic data can provide unparalleled insight into the evolutionary history of Caryophyllales. We also discuss a method for overcoming computational challenges associated with homolog clustering in large data sets.
Collapse
Affiliation(s)
- Joseph F Walker
- Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI, 48109-1048, USA
| | - Ya Yang
- Department of Plant and Microbial Biology, University of Minnesota-Twin Cities, 1445 Gortner Avenue, St. Paul, MN, 55108, USA
| | - Tao Feng
- Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK
| | - Alfonso Timoneda
- Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK
| | - Jessica Mikenas
- Department of Biology, Oberlin College, Science Center K111, 119 Woodland Street, Oberlin, OH, 44074-1097, USA
| | - Vera Hutchison
- Department of Biology, Oberlin College, Science Center K111, 119 Woodland Street, Oberlin, OH, 44074-1097, USA
| | - Caroline Edwards
- Department of Biology, Oberlin College, Science Center K111, 119 Woodland Street, Oberlin, OH, 44074-1097, USA
| | - Ning Wang
- Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI, 48109-1048, USA
| | - Sonia Ahluwalia
- Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI, 48109-1048, USA
| | - Julia Olivieri
- Department of Biology, Oberlin College, Science Center K111, 119 Woodland Street, Oberlin, OH, 44074-1097, USA
- Institute of Computational and Mathematical Engineering (ICME), Stanford University, 475 Via Ortega, Suite B060, Stanford, CA, 94305-4042, USA
| | - Nathanael Walker-Hale
- School of Biological Sciences, Victoria University of Wellington, Kelburn Parade, Kelburn, Wellington, 6012, New Zealand
| | - Lucas C Majure
- Department of Research, Conservation and Collections, Desert Botanical Garden, 1201 N. Galvin Pkwy, Phoenix, AZ, 85008, USA
| | - Raúl Puente
- Department of Research, Conservation and Collections, Desert Botanical Garden, 1201 N. Galvin Pkwy, Phoenix, AZ, 85008, USA
| | - Gudrun Kadereit
- Institut für Molekulare Physiologie, Johannes Gutenberg-Universität Mainz, D-55099, Mainz, Germany
- Institut für Molekulare und Organismische Evolutionsbiologie, Johannes Gutenberg-Universität Mainz, D-55099, Mainz, Germany
| | - Maximilian Lauterbach
- Institut für Molekulare Physiologie, Johannes Gutenberg-Universität Mainz, D-55099, Mainz, Germany
- Institut für Molekulare und Organismische Evolutionsbiologie, Johannes Gutenberg-Universität Mainz, D-55099, Mainz, Germany
| | - Urs Eggli
- Sukkulenten-Sammlung Zürich / Grün Stadt Zürich, Mythenquai 88, CH-8002, Zürich, Switzerland
| | - Hilda Flores-Olvera
- Departamento de Botánica, Universidad Nacional Autónoma de México, Apartado, Postal 70-367, 04510, Mexico City, Mexico
| | - Helga Ochoterena
- Departamento de Botánica, Universidad Nacional Autónoma de México, Apartado, Postal 70-367, 04510, Mexico City, Mexico
| | | | - Michael J Moore
- Department of Biology, Oberlin College, Science Center K111, 119 Woodland Street, Oberlin, OH, 44074-1097, USA
| | - Stephen A Smith
- Department of Ecology & Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, MI, 48109-1048, USA
| |
Collapse
|
10
|
Genome-Guided Phylo-Transcriptomic Methods and the Nuclear Phylogentic Tree of the Paniceae Grasses. Sci Rep 2017; 7:13528. [PMID: 29051622 PMCID: PMC5648822 DOI: 10.1038/s41598-017-13236-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 09/20/2017] [Indexed: 11/23/2022] Open
Abstract
The past few years have witnessed a paradigm shift in molecular systematics from phylogenetic methods (using one or a few genes) to those that can be described as phylogenomics (phylogenetic inference with entire genomes). One approach that has recently emerged is phylo-transcriptomics (transcriptome-based phylogenetic inference). As in any phylogenetics experiment, accurate orthology inference is critical to phylo-transcriptomics. To date, most analyses have inferred orthology based either on pure sequence similarity or using gene-tree approaches. The use of conserved genome synteny in orthology detection has been relatively under-employed in phylogenetics, mainly due to the cost of sequencing genomes. While current trends focus on the quantity of genes included in an analysis, the use of synteny is likely to improve the quality of ortholog inference. In this study, we combine de novo transcriptome data and sequenced genomes from an economically important group of grass species, the tribe Paniceae, to make phylogenomic inferences. This method, which we call “genome-guided phylo-transcriptomics”, is compared to other recently published orthology inference pipelines, and benchmarked using a set of sequenced genomes from across the grasses. These comparisons provide a framework for future researchers to evaluate the costs and benefits of adding sequenced genomes to transcriptome data sets.
Collapse
|
11
|
Battenberg K, Lee EK, Chiu JC, Berry AM, Potter D. OrthoReD: a rapid and accurate orthology prediction tool with low computational requirement. BMC Bioinformatics 2017. [PMID: 28633662 PMCID: PMC5479036 DOI: 10.1186/s12859-017-1726-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Identifying orthologous genes is an initial step required for phylogenetics, and it is also a common strategy employed in functional genetics to find candidates for functionally equivalent genes across multiple species. At the same time, in silico orthology prediction tools often require large computational resources only available on computing clusters. Here we present OrthoReD, an open-source orthology prediction tool with accuracy comparable to published tools that requires only a desktop computer. The low computational resource requirement of OrthoReD is achieved by repeating orthology searches on one gene of interest at a time, thereby generating a reduced dataset to limit the scope of orthology search for each gene of interest. Results The output of OrthoReD was highly similar to the outputs of two other published orthology prediction tools, OrthologID and/or OrthoDB, for the three dataset tested, which represented three phyla with different ranges of species diversity and different number of genomes included. Median CPU time for ortholog prediction per gene by OrthoReD executed on a desktop computer was <15 min even for the largest dataset tested, which included all coding sequences of 100 bacterial species. Conclusions With high-throughput sequencing, unprecedented numbers of genes from non-model organisms are available with increasing need for clear information about their orthologies and/or functional equivalents in model organisms. OrthoReD is not only fast and accurate as an orthology prediction tool, but also gives researchers flexibility in the number of genes analyzed at a time, without requiring a high-performance computing cluster. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1726-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kai Battenberg
- Department of Plant Sciences, University of California, Davis, CA, USA.
| | - Ernest K Lee
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| | - Joanna C Chiu
- Department of Entomology and Nematology, University of California, Davis, CA, USA
| | - Alison M Berry
- Department of Plant Sciences, University of California, Davis, CA, USA
| | - Daniel Potter
- Department of Plant Sciences, University of California, Davis, CA, USA
| |
Collapse
|