51
|
Wong GKS, Soltis DE, Leebens-Mack J, Wickett NJ, Barker MS, Van de Peer Y, Graham SW, Melkonian M. Sequencing and Analyzing the Transcriptomes of a Thousand Species Across the Tree of Life for Green Plants. ANNUAL REVIEW OF PLANT BIOLOGY 2020; 71:741-765. [PMID: 31851546 DOI: 10.1146/annurev-arplant-042916-041040] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The 1,000 Plants (1KP) initiative was the first large-scale effort to collect next-generation sequencing (NGS) data across a phylogenetically representative sampling of species for a major clade of life, in this case theViridiplantae, or green plants. As an international multidisciplinary consortium, we focused on plant evolution and its practical implications. Among the major outcomes were the inference of a reference species tree for green plants by phylotranscriptomic analysis of low-copy genes, a survey of paleopolyploidy (whole-genome duplications) across the Viridiplantae, the inferred evolutionary histories for many gene families and biological processes, the discovery of novel light-sensitive proteins for optogenetic studies in mammalian neuroscience, and elucidation of the genetic network for a complex trait (C4 photosynthesis). Altogether, 1KP demonstrated how value can be extracted from a phylodiverse sequencing data set, providing a template for future projects that aim to generate even more data, including complete de novo genomes, across the tree of life.
Collapse
Affiliation(s)
- Gane Ka-Shu Wong
- Department of Biological Sciences and Department of Medicine, University of Alberta, Edmonton, Alberta T6G 2E9, Canada;
- BGI-Shenzhen, Beishan Industrial Zone, Yantian District, Shenzhen 518083, China
| | - Douglas E Soltis
- Florida Museum of Natural History, Gainesville, Florida 32611, USA
- Department of Biology, University of Florida, Gainesville, Florida 32611, USA
| | - Jim Leebens-Mack
- Department of Plant Biology, University of Georgia, Athens, Georgia 30602, USA
| | - Norman J Wickett
- Negaunee Institute for Plant Conservation Science and Action, Chicago Botanic Garden, Glencoe, Illinois 60022, USA
| | - Michael S Barker
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721, USA
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, VIB Center for Plant Systems Biology, Ghent University, 9052 Ghent, Belgium
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa
| | - Sean W Graham
- Department of Botany, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Michael Melkonian
- Faculty of Biology, University of Duisburg-Essen, D-45141 Essen, Germany
| |
Collapse
|
52
|
Jones MG, Khodaverdian A, Quinn JJ, Chan MM, Hussmann JA, Wang R, Xu C, Weissman JS, Yosef N. Inference of single-cell phylogenies from lineage tracing data using Cassiopeia. Genome Biol 2020; 21:92. [PMID: 32290857 PMCID: PMC7155257 DOI: 10.1186/s13059-020-02000-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 03/13/2020] [Indexed: 12/14/2022] Open
Abstract
The pairing of CRISPR/Cas9-based gene editing with massively parallel single-cell readouts now enables large-scale lineage tracing. However, the rapid growth in complexity of data from these assays has outpaced our ability to accurately infer phylogenetic relationships. First, we introduce Cassiopeia-a suite of scalable maximum parsimony approaches for tree reconstruction. Second, we provide a simulation framework for evaluating algorithms and exploring lineage tracer design principles. Finally, we generate the most complex experimental lineage tracing dataset to date, 34,557 human cells continuously traced over 15 generations, and use it for benchmarking phylogenetic inference approaches. We show that Cassiopeia outperforms traditional methods by several metrics and under a wide variety of parameter regimes, and provide insight into the principles for the design of improved Cas9-enabled recorders. Together, these should broadly enable large-scale mammalian lineage tracing efforts. Cassiopeia and its benchmarking resources are publicly available at www.github.com/YosefLab/Cassiopeia.
Collapse
Affiliation(s)
- Matthew G Jones
- Biological and Medical Informatics Graduate Program, University of California San Francisco, San Francisco, CA, USA
- Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California San Francisco, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, CA, USA
| | - Alex Khodaverdian
- Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA
| | - Jeffrey J Quinn
- Howard Hughes Medical Institute, University of California San Francisco, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, CA, USA
- Center for RNA Systems Biology, University of California San Francisco, San Francisco, CA, USA
| | - Michelle M Chan
- Howard Hughes Medical Institute, University of California San Francisco, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, CA, USA
- Center for RNA Systems Biology, University of California San Francisco, San Francisco, CA, USA
| | - Jeffrey A Hussmann
- Howard Hughes Medical Institute, University of California San Francisco, San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, CA, USA
- Center for RNA Systems Biology, University of California San Francisco, San Francisco, CA, USA
- University of California, San Francisco, Department of Microbiology and Immunology, San Francisco, California, USA
| | - Robert Wang
- Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA
| | - Chenling Xu
- Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA
| | - Jonathan S Weissman
- Howard Hughes Medical Institute, University of California San Francisco, San Francisco, CA, USA.
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco, CA, USA.
- Center for RNA Systems Biology, University of California San Francisco, San Francisco, CA, USA.
| | - Nir Yosef
- Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Science and Center for Computational Biology, University of California Berkeley, Berkeley, CA, USA.
- Ragon Institute of Massachusetts General Hospital - MIT and Harvard, Cambridge, MA, USA.
- Chan Zuckerberg Biohub Investigator, San Francisco, CA, USA.
| |
Collapse
|
53
|
Prasanna AN, Gerber D, Kijpornyongpan T, Aime MC, Doyle VP, Nagy LG. Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships. Syst Biol 2020; 69:17-37. [PMID: 31062852 DOI: 10.1093/sysbio/syz029] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 04/21/2019] [Accepted: 04/26/2019] [Indexed: 11/12/2022] Open
Abstract
Resolving deep divergences in the tree of life is challenging even for analyses of genome-scale phylogenetic data sets. Relationships between Basidiomycota subphyla, the rusts and allies (Pucciniomycotina), smuts and allies (Ustilaginomycotina), and mushroom-forming fungi and allies (Agaricomycotina) were found particularly recalcitrant both to traditional multigene and genome-scale phylogenetics. Here, we address basal Basidiomycota relationships using concatenated and gene tree-based analyses of various phylogenomic data sets to examine the contribution of several potential sources of bias. We evaluate the contribution of biological causes (hard polytomy, incomplete lineage sorting) versus unmodeled evolutionary processes and factors that exacerbate their effects (e.g., fast-evolving sites and long-branch taxa) to inferences of basal Basidiomycota relationships. Bayesian Markov Chain Monte Carlo and likelihood mapping analyses reject the hard polytomy with confidence. In concatenated analyses, fast-evolving sites and oversimplified models of amino acid substitution favored the grouping of smuts with mushroom-forming fungi, often leading to maximal bootstrap support in both concatenation and coalescent analyses. On the contrary, the most conserved data subsets grouped rusts and allies with mushroom-forming fungi, although this relationship proved labile, sensitive to model choice, to different data subsets and to missing data. Excluding putative long-branch taxa, genes with high proportions of missing data and/or with strong signal failed to reveal a consistent trend toward one or the other topology, suggesting that additional sources of conflict are at play. While concatenated analyses yielded strong but conflicting support, individual gene trees mostly provided poor support for any resolution of rusts, smuts, and mushroom-forming fungi, suggesting that the true Basidiomycota tree might be in a part of tree space that is difficult to access using both concatenation and gene tree-based approaches. Inference-based assessments of absolute model fit strongly reject best-fit models for the vast majority of genes, indicating a poor fit of even the most commonly used models. While this is consistent with previous assessments of site-homogenous models of amino acid evolution, this does not appear to be the sole source of confounding signal. Our analyses suggest that topologies uniting smuts with mushroom-forming fungi can arise as a result of inappropriate modeling of amino acid sites that might be prone to systematic bias. We speculate that improved models of sequence evolution could shed more light on basal splits in the Basidiomycota, which, for now, remain unresolved despite the use of whole genome data.
Collapse
Affiliation(s)
- Arun N Prasanna
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| | - Daniel Gerber
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary.,Institute of Archaeology, Research Centre for the Humanities, Hungarian Academy of Sciences, Budapest 1097, Hungary
| | | | - M Catherine Aime
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN 47907, USA
| | - Vinson P Doyle
- Department of Plant Pathology and Crop Physiology, Louisiana State University AgCenter, Baton Rouge, LA 70803, USA
| | - Laszlo G Nagy
- Synthetic and Systems Biology Unit, Institute of Biochemistry, BRC-HAS, Szeged 6726, Hungary
| |
Collapse
|
54
|
Wang Q, Li H. Phylogeny of the superfamily Gelechioidea (Lepidoptera: Obtectomera), with an exploratory application on geometric morphometrics. ZOOL SCR 2020. [DOI: 10.1111/zsc.12407] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Qing‐Yun Wang
- College of Life Sciences Nankai University Tianjin China
| | - Hou‐Hun Li
- College of Life Sciences Nankai University Tianjin China
| |
Collapse
|
55
|
Hyun DY, Sebastin R, Lee KJ, Lee GA, Shin MJ, Kim SH, Lee JR, Cho GT. Genotyping-by-Sequencing Derived Single Nucleotide Polymorphisms Provide the First Well-Resolved Phylogeny for the Genus Triticum (Poaceae). FRONTIERS IN PLANT SCIENCE 2020; 11:688. [PMID: 32625218 PMCID: PMC7311657 DOI: 10.3389/fpls.2020.00688] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 04/30/2020] [Indexed: 05/17/2023]
Abstract
Wheat (Triticum spp.) has been an important staple food crop for mankind since the beginning of agriculture. The genus Triticum L. is composed of diploid, tetraploid, and hexaploid species, majority of which have not yet been discriminated clearly, and hence their phylogeny and classification remain unresolved. Genotyping-by-sequencing (GBS) is an easy and affordable method that allows us to generate genome-wide single nucleotide polymorphism (SNP) markers. In this study, we used GBS to obtain SNPs covering all seven chromosomes from 283 accessions of Triticum-related genera. After filtering low-quality and redundant SNPs based on haplotype information, the GBS assay provided 14,188 high-quality SNPs that were distributed across the A (71%), B (26%), and D (2.4%) genomes. Cluster analysis and discriminant analysis of principal components (DAPC) allowed us to distinguish six distinct groups that matched well with Triticum species complexity. We constructed a Bayesian phylogenetic tree using 14,188 SNPs, in which 17 Triticum species and subspecies were discriminated. Dendrogram analysis revealed that the polyploid wheat species could be divided into groups according to the presence of A, B, D, and G genomes with strong nodal support and provided new insight into the evolution of spelt wheat. A total of 2,692 species-specific SNPs were identified to discriminate the common (T. aestivum) and durum (T. turgidum) wheat cultivar and landraces. In principal component analysis grouping, the two wheat species formed individual clusters and the SNPs were able to distinguish up to nine groups of 10 subspecies. This study demonstrated that GBS-derived SNPs could be used efficiently in genebank management to classify Triticum species and subspecies that are very difficult to distinguish by their morphological characters.
Collapse
|
56
|
Du Y, Wu S, Edwards SV, Liu L. The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life. BMC Evol Biol 2019; 19:203. [PMID: 31694538 PMCID: PMC6833305 DOI: 10.1186/s12862-019-1534-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 10/21/2019] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. RESULTS The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. CONCLUSIONS Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data.
Collapse
Affiliation(s)
- Yan Du
- Department of Statistics, University of Georgia, 310 Herty Drive, Athens, GA 30606 USA
| | - Shaoyuan Wu
- Jiangsu Key Laboratory of Phylogenomics & Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu 221116 People’s Republic of China
| | - Scott V. Edwards
- Department of Organismic & Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138 USA
| | - Liang Liu
- Liang Liu, Department of Statistics and Institute of Bioinformatics, University of Georgia, 310 Herty Drive, Athens, GA 30606 USA
| |
Collapse
|
57
|
Gatesy J, Sloan DB, Warren JM, Baker RH, Simmons MP, Springer MS. Partitioned coalescence support reveals biases in species-tree methods and detects gene trees that determine phylogenomic conflicts. Mol Phylogenet Evol 2019; 139:106539. [DOI: 10.1016/j.ympev.2019.106539] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 06/10/2019] [Accepted: 06/17/2019] [Indexed: 12/26/2022]
|
58
|
Bos KI, Kühnert D, Herbig A, Esquivel-Gomez LR, Andrades Valtueña A, Barquera R, Giffin K, Kumar Lankapalli A, Nelson EA, Sabin S, Spyrou MA, Krause J. Paleomicrobiology: Diagnosis and Evolution of Ancient Pathogens. Annu Rev Microbiol 2019; 73:639-666. [PMID: 31283430 DOI: 10.1146/annurev-micro-090817-062436] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The last century has witnessed progress in the study of ancient infectious disease from purely medical descriptions of past ailments to dynamic interpretations of past population health that draw upon multiple perspectives. The recent adoption of high-throughput DNA sequencing has led to an expanded understanding of pathogen presence, evolution, and ecology across the globe. This genomic revolution has led to the identification of disease-causing microbes in both expected and unexpected contexts, while also providing for the genomic characterization of ancient pathogens previously believed to be unattainable by available methods. In this review we explore the development of DNA-based ancient pathogen research, the specialized methods and tools that have emerged to authenticate and explore infectious disease of the past, and the unique challenges that persist in molecular paleopathology. We offer guidelines to mitigate the impact of these challenges, which will allow for more reliable interpretations of data in this rapidly evolving field of investigation.
Collapse
Affiliation(s)
- Kirsten I Bos
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;
| | - Denise Kühnert
- Transmission, Infection, Diversification and Evolution Group, Max Planck Institute for the Science of Human History, 07745 Jena, Germany
| | - Alexander Herbig
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;
| | - Luis Roger Esquivel-Gomez
- Transmission, Infection, Diversification and Evolution Group, Max Planck Institute for the Science of Human History, 07745 Jena, Germany
| | - Aida Andrades Valtueña
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;
| | - Rodrigo Barquera
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;
| | - Karen Giffin
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;
| | - Aditya Kumar Lankapalli
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;
| | - Elizabeth A Nelson
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;
| | - Susanna Sabin
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;
| | - Maria A Spyrou
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany;
| | - Johannes Krause
- Department of Archaeogenetics, Max Planck Institute for the Science of Human History, 07745 Jena, Germany; .,Faculty of Biological Sciences, Friedrich Schiller University, 07737 Jena, Germany
| |
Collapse
|
59
|
Tao Q, Tamura K, U. Battistuzzi F, Kumar S. A Machine Learning Method for Detecting Autocorrelation of Evolutionary Rates in Large Phylogenies. Mol Biol Evol 2019; 36:811-824. [PMID: 30689923 PMCID: PMC6804408 DOI: 10.1093/molbev/msz014] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
New species arise from pre-existing species and inherit similar genomes and environments. This predicts greater similarity of the tempo of molecular evolution between direct ancestors and descendants, resulting in autocorrelation of evolutionary rates in the tree of life. Surprisingly, molecular sequence data have not confirmed this expectation, possibly because available methods lack the power to detect autocorrelated rates. Here, we present a machine learning method, CorrTest, to detect the presence of rate autocorrelation in large phylogenies. CorrTest is computationally efficient and performs better than the available state-of-the-art method. Application of CorrTest reveals extensive rate autocorrelation in DNA and amino acid sequence evolution of mammals, birds, insects, metazoans, plants, fungi, parasitic protozoans, and prokaryotes. Therefore, rate autocorrelation is a common phenomenon throughout the tree of life. These findings suggest concordance between molecular and nonmolecular evolutionary patterns, and they will foster unbiased and precise dating of the tree of life.
Collapse
Affiliation(s)
- Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Koichiro Tamura
- Department of Biological Sciences, Tokyo Metropolitan University, Tokyo, Japan
- Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University, Tokyo, Japan
| | | | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
- Corresponding author: E-mail:
| |
Collapse
|
60
|
Johnston PR, Quijada L, Smith CA, Baral HO, Hosoya T, Baschien C, Pärtel K, Zhuang WY, Haelewaters D, Park D, Carl S, López-Giráldez F, Wang Z, Townsend JP. A multigene phylogeny toward a new phylogenetic classification of Leotiomycetes. IMA Fungus 2019; 10:1. [PMID: 32647610 PMCID: PMC7325659 DOI: 10.1186/s43008-019-0002-x] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 04/30/2019] [Indexed: 12/31/2022] Open
Abstract
Fungi in the class Leotiomycetes are ecologically diverse, including mycorrhizas, endophytes of roots and leaves, plant pathogens, aquatic and aero-aquatic hyphomycetes, mammalian pathogens, and saprobes. These fungi are commonly detected in cultures from diseased tissue and from environmental DNA extracts. The identification of specimens from such character-poor samples increasingly relies on DNA sequencing. However, the current classification of Leotiomycetes is still largely based on morphologically defined taxa, especially at higher taxonomic levels. Consequently, the formal Leotiomycetes classification is frequently poorly congruent with the relationships suggested by DNA sequencing studies. Previous class-wide phylogenies of Leotiomycetes have been based on ribosomal DNA markers, with most of the published multi-gene studies being focussed on particular genera or families. In this paper we collate data available from specimens representing both sexual and asexual morphs from across the genetic breadth of the class, with a focus on generic type species, to present a phylogeny based on up to 15 concatenated genes across 279 specimens. Included in the dataset are genes that were extracted from 72 of the genomes available for the class, including 10 new genomes released with this study. To test the statistical support for the deepest branches in the phylogeny, an additional phylogeny based on 3156 genes from 51 selected genomes is also presented. To fill some of the taxonomic gaps in the 15-gene phylogeny, we further present an ITS gene tree, particularly targeting ex-type specimens of generic type species. A small number of novel taxa are proposed: Marthamycetales ord. nov., and Drepanopezizaceae and Mniaeciaceae fams. nov. The formal taxonomic changes are limited in part because of the ad hoc nature of taxon and specimen selection, based purely on the availability of data. The phylogeny constitutes a framework for enabling future taxonomically targeted studies using deliberate specimen selection. Such studies will ideally include designation of epitypes for the type species of those genera for which DNA is not able to be extracted from the original type specimen, and consideration of morphological characters whenever genetically defined clades are recognized as formal taxa within a classification.
Collapse
Affiliation(s)
- Peter R. Johnston
- Manaaki Whenua Landcare Research, Private Bag 92170, Auckland, 1142 New Zealand
| | - Luis Quijada
- Department of Organismic and Evolutionary Biology, Harvard Herbarium, 22 Divinity Ave, Cambridge, MA 02138 USA
| | | | | | - Tsuyoshi Hosoya
- Department of Botany, National Museum of Nature and Science, 4-1-1 Amakubo, Tsukuba, Ibaraki 305-0005 Japan
| | - Christiane Baschien
- Leibniz-Institute DSMZ German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7B, 38124 Braunschweig, Germany
| | - Kadri Pärtel
- Institute of Ecology and Earth Sciences, University of Tartu, Lai 40, EE-51005 Tartu, Estonia
| | - Wen-Ying Zhuang
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, 100101 China
| | - Danny Haelewaters
- Department of Organismic and Evolutionary Biology, Harvard Herbarium, 22 Divinity Ave, Cambridge, MA 02138 USA
- Faculty of Science, University of South Bohemia, Branišovská 31, 370 05 České Budějovice, Czech Republic
| | - Duckchul Park
- Manaaki Whenua Landcare Research, Private Bag 92170, Auckland, 1142 New Zealand
| | - Steffen Carl
- Leibniz-Institute DSMZ German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7B, 38124 Braunschweig, Germany
| | | | - Zheng Wang
- Department of Biostatistics, Yale University, 135 College St, New Haven, CT 06510 USA
| | - Jeffrey P. Townsend
- Department of Biostatistics, Yale University, 135 College St, New Haven, CT 06510 USA
| |
Collapse
|
61
|
Olofsson JK, Cantera I, Van de Paer C, Hong-Wa C, Zedane L, Dunning LT, Alberti A, Christin PA, Besnard G. Phylogenomics using low-depth whole genome sequencing: A case study with the olive tribe. Mol Ecol Resour 2019; 19:877-892. [PMID: 30934146 DOI: 10.1111/1755-0998.13016] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 03/19/2019] [Accepted: 03/25/2019] [Indexed: 12/20/2022]
Abstract
Species trees have traditionally been inferred from a few selected markers, and genome-wide investigations remain largely restricted to model organisms or small groups of species for which sampling of fresh material is available, leaving out most of the existing and historical species diversity. The genomes of an increasing number of species, including specimens extracted from natural history collections, are being sequenced at low depth. While these data sets are widely used to analyse organelle genomes, the nuclear fraction is generally ignored. Here we evaluate different reference-based methods to infer phylogenies of large taxonomic groups from such data sets. Using the example of the Oleeae tribe, a worldwide-distributed group, we build phylogenies based on single nucleotide polymorphisms (SNPs) obtained using two reference genomes (the olive and ash trees). The inferred phylogenies are overall congruent, yet present differences that might reflect the effect of distance to the reference on the amount of missing data. To limit this issue, genome complexity was reduced by using pairs of orthologous coding sequences as the reference, thus allowing us to combine SNPs obtained using two distinct references. Concatenated and coalescence trees based on these combined SNPs suggest events of incomplete lineage sorting and/or hybridization during the diversification of this large phylogenetic group. Our results show that genome-wide phylogenetic trees can be inferred from low-depth sequence data sets for eukaryote groups with complex genomes, and histories of reticulate evolution. This opens new avenues for large-scale phylogenomics and biogeographical analyses covering both the extant and the historical diversity stored in museum collections.
Collapse
Affiliation(s)
- Jill K Olofsson
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | - Isabel Cantera
- Laboratoire Évolution and Diversité Biologique (EDB, UMR5174), CNRS, UPS, IRD, Université de Toulouse, Toulouse, France
| | - Céline Van de Paer
- Laboratoire Évolution and Diversité Biologique (EDB, UMR5174), CNRS, UPS, IRD, Université de Toulouse, Toulouse, France
| | - Cynthia Hong-Wa
- Claude E. Phillips Herbarium, Delaware State University, Dover, Delaware
| | - Loubab Zedane
- Laboratoire Évolution and Diversité Biologique (EDB, UMR5174), CNRS, UPS, IRD, Université de Toulouse, Toulouse, France
| | - Luke T Dunning
- Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK
| | - Adriana Alberti
- Genoscope, CEA - Institut de biologie François-Jacob, Evry Cedex, France
| | | | - Guillaume Besnard
- Laboratoire Évolution and Diversité Biologique (EDB, UMR5174), CNRS, UPS, IRD, Université de Toulouse, Toulouse, France
| |
Collapse
|
62
|
Effects of missing data and data type on phylotranscriptomic analysis of stony corals (Cnidaria: Anthozoa: Scleractinia). Mol Phylogenet Evol 2019; 134:12-23. [DOI: 10.1016/j.ympev.2019.01.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 01/11/2019] [Accepted: 01/17/2019] [Indexed: 01/28/2023]
|
63
|
Shin S, Clarke DJ, Lemmon AR, Moriarty Lemmon E, Aitken AL, Haddad S, Farrell BD, Marvaldi AE, Oberprieler RG, McKenna DD. Phylogenomic Data Yield New and Robust Insights into the Phylogeny and Evolution of Weevils. Mol Biol Evol 2019; 35:823-836. [PMID: 29294021 DOI: 10.1093/molbev/msx324] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The phylogeny and evolution of weevils (the beetle superfamily Curculionoidea) has been extensively studied, but many relationships, especially in the large family Curculionidae (true weevils; > 50,000 species), remain uncertain. We used phylogenomic methods to obtain DNA sequences from 522 protein-coding genes for representatives of all families of weevils and all subfamilies of Curculionidae. Most of our phylogenomic results had strong statistical support, and the inferred relationships were generally congruent with those reported in previous studies, but with some interesting exceptions. Notably, the backbone relationships of the weevil phylogeny were consistently strongly supported, and the former Nemonychidae (pine flower snout beetles) were polyphyletic, with the subfamily Cimberidinae (here elevated to Cimberididae) placed as sister group of all other weevils. The clade comprising the sister families Brentidae (straight-snouted weevils) and Curculionidae was maximally supported and the composition of both families was firmly established. The contributions of substitution modeling, codon usage and/or mutational bias to differences between trees reconstructed from amino acid and nucleotide sequences were explored. A reconstructed timetree for weevils is consistent with a Mesozoic radiation of gymnosperm-associated taxa to form most extant families and diversification of Curculionidae alongside flowering plants-first monocots, then other groups-beginning in the Cretaceous.
Collapse
Affiliation(s)
- Seunggwan Shin
- Department of Biological Sciences, University of Memphis, Memphis, TN
| | - Dave J Clarke
- Department of Biological Sciences, University of Memphis, Memphis, TN
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, Tallahassee, FL
| | | | | | - Stephanie Haddad
- Department of Biological Sciences, University of Memphis, Memphis, TN
| | - Brian D Farrell
- Museum of Comparative Zoology, Harvard University, Cambridge, MA
| | - Adriana E Marvaldi
- CONICET, División Entomología, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina
| | | | - Duane D McKenna
- Department of Biological Sciences, University of Memphis, Memphis, TN
| |
Collapse
|
64
|
Parks MB, Wickett NJ, Alverson AJ. Signal, Uncertainty, and Conflict in Phylogenomic Data for a Diverse Lineage of Microbial Eukaryotes (Diatoms, Bacillariophyta). Mol Biol Evol 2019; 35:80-93. [PMID: 29040712 PMCID: PMC5850769 DOI: 10.1093/molbev/msx268] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Diatoms (Bacillariophyta) are a species-rich group of eukaryotic microbes diverse in morphology, ecology, and metabolism. Previous reconstructions of the diatom phylogeny based on one or a few genes have resulted in inconsistent resolution or low support for critical nodes. We applied phylogenetic paralog pruning techniques to a data set of 94 diatom genomes and transcriptomes to infer perennially difficult species relationships, using concatenation and summary-coalescent methods to reconstruct species trees from data sets spanning a wide range of thresholds for taxon and column occupancy in gene alignments. Conflicts between gene and species trees decreased with both increasing taxon occupancy and bootstrap cutoffs applied to gene trees. Concordance between gene and species trees was lowest for short internodes and increased logarithmically with increasing edge length, suggesting that incomplete lineage sorting disproportionately affects species tree inference at short internodes, which are a common feature of the diatom phylogeny. Although species tree topologies were largely consistent across many data treatments, concatenation methods appeared to outperform summary-coalescent methods for sparse alignments. Our results underscore that approaches to species-tree inference based on few loci are likely to be misled by unrepresentative sampling of gene histories, particularly in lineages that may have diversified rapidly. In addition, phylogenomic studies of diatoms, and potentially other hyperdiverse groups, should maximize the number of gene trees with high taxon occupancy, though there is clearly a limit to how many of these genes will be available.
Collapse
Affiliation(s)
- Matthew B Parks
- Daniel F. and Ada L. Rice Plant Conservation Science Center, Chicago Botanic Garden, Glencoe, IL
| | - Norman J Wickett
- Daniel F. and Ada L. Rice Plant Conservation Science Center, Chicago Botanic Garden, Glencoe, IL
| | - Andrew J Alverson
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR
| |
Collapse
|
65
|
Montingelli GG, Grazziotin FG, Battilana J, Murphy RW, Zhang Y, Zaher H. Higher‐level phylogenetic affinities of the Neotropical genus
Mastigodryas
Amaral, 1934 (Serpentes: Colubridae), species‐group definition and description of a new genus for
Mastigodryas bifossatus. J ZOOL SYST EVOL RES 2019. [DOI: 10.1111/jzs.12262] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
- Giovanna G. Montingelli
- Department of Life SciencesNatural History Museum London UK
- Museu de Zoologia da Universidade de São Paulo São Paulo Brazil
| | | | | | - Robert W. Murphy
- Royal Ontario MuseumCentre for Biodiversity and Conservation Biology Toronto Ontario Canada
- State Key Laboratory of Genetic Resources and EvolutionKunming Institute of Zoology Kunming China
| | - Ya‐Ping Zhang
- State Key Laboratory of Genetic Resources and EvolutionKunming Institute of Zoology Kunming China
- Laboratory for Conservation and Utilization of Bio‐ResourcesYunnan University Kunming China
| | - Hussam Zaher
- Museu de Zoologia da Universidade de São Paulo São Paulo Brazil
| |
Collapse
|
66
|
White DM, Islam MB, Mason-Gamer RJ. Phylogenetic inference in section Archerythroxylum informs taxonomy, biogeography, and the domestication of coca (Erythroxylum species). AMERICAN JOURNAL OF BOTANY 2019; 106:154-165. [PMID: 30629286 DOI: 10.1002/ajb2.1224] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/19/2018] [Indexed: 05/12/2023]
Abstract
PREMISE OF THE STUDY This investigation establishes the first DNA-sequence-based phylogenetic hypothesis of species relationships in the coca family (Erythroxylaceae) and presents its implications for the intrageneric taxonomy and neotropical biogeography of Erythroxylum. We also identify the closest wild relatives and evolutionary relationships of the cultivated coca taxa. METHODS We focused our phylogenomic inference on the largest taxonomic section in the genus Erythroxylum (Archerythroxylum O.E.Schulz) using concatenation and gene tree reconciliation methods from hybridization-based target capture of 427 genes. KEY RESULTS We show that neotropical Erythroxylum are monophyletic within the paleotropical lineages, yet Archerythroxylum and all of the other taxonomic sections from which we sampled multiple species lack monophyly. We mapped phytogeographic states onto the tree and found some concordance between these regions and clades. The wild species E. gracilipes and E. cataractarum are most closely related to the cultivated E. coca and E. novogranatense, but relationships within this "coca" clade remain equivocal. CONCLUSIONS Our results point to the difficulty of morphology-based intrageneric classification in this clade and highlight the importance of integrative taxonomy in future systematic revisions. We can confidently identify E. gracilipes and E. cataractarum as the closest wild relatives of the coca taxa, but understanding the domestication history of this crop will require more thorough phylogeographic analysis.
Collapse
Affiliation(s)
- Dawson M White
- Department of Biological Sciences, University of Illinois at Chicago, 845 West Taylor Street Room 3256 (M/C 066), Chicago, IL, 60612, USA
- Department of Science and Education, Field Museum of Natural History, 1400 South Lake Shore Drive, Chicago, IL, 60605, USA
| | - Melissa B Islam
- Department of Ecology and Evolutionary Biology, University of Colorado, Ramaley N122, Campus Box 334, Boulder, CO, 80309, USA
| | - Roberta J Mason-Gamer
- Department of Biological Sciences, University of Illinois at Chicago, 845 West Taylor Street Room 3256 (M/C 066), Chicago, IL, 60612, USA
| |
Collapse
|
67
|
Liu L, Anderson C, Pearl D, Edwards SV. Modern Phylogenomics: Building Phylogenetic Trees Using the Multispecies Coalescent Model. Methods Mol Biol 2019; 1910:211-239. [PMID: 31278666 DOI: 10.1007/978-1-4939-9074-0_7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The multispecies coalescent (MSC) model provides a compelling framework for building phylogenetic trees from multilocus DNA sequence data. The pure MSC is best thought of as a special case of so-called "multispecies network coalescent" models, in which gene flow is allowed among branches of the tree, whereas MSC methods assume there is no gene flow between diverging species. Early implementations of the MSC, such as "parsimony" or "democratic vote" approaches to combining information from multiple gene trees, as well as concatenation, in which DNA sequences from multiple gene trees are combined into a single "supergene," were quickly shown to be inconsistent in some regions of tree space, in so far as they converged on the incorrect species tree as more gene trees and sequence data were accumulated. The anomaly zone, a region of tree space in which the most frequent gene tree is different from the species tree, is one such region where many so-called "coalescent" methods are inconsistent. Second-generation implementations of the MSC employed Bayesian or likelihood models; these are consistent in all regions of gene tree space, but Bayesian methods in particular are incapable of handling the large phylogenomic data sets currently available. Two-step methods, such as MP-EST and ASTRAL, in which gene trees are first estimated and then combined to estimate an overarching species tree, are currently popular in part because they can handle large phylogenomic data sets. These methods are consistent in the anomaly zone but can sometimes provide inappropriate measures of tree support or apportion error and signal in the data inappropriately. MP-EST in particular employs a likelihood model which can be conveniently manipulated to perform statistical tests of competing species trees, incorporating the likelihood of the collected gene trees on each species tree in a likelihood ratio test. Such tests provide a useful alternative to the multilocus bootstrap, which only indirectly tests the appropriateness of competing species trees. We illustrate these tests and implementations of the MSC with examples and suggest that MSC methods are a useful class of models effectively using information from multiple loci to build phylogenetic trees.
Collapse
Affiliation(s)
- Liang Liu
- Department of Statistics, University of Georgia, Athens, GA, USA
| | | | - Dennis Pearl
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
| | - Scott V Edwards
- Department of Organismic and Evolutionary Biology & Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
68
|
Carlsen MM, Fér T, Schmickl R, Leong-Škorničková J, Newman M, Kress WJ. Resolving the rapid plant radiation of early diverging lineages in the tropical Zingiberales: Pushing the limits of genomic data. Mol Phylogenet Evol 2018; 128:55-68. [DOI: 10.1016/j.ympev.2018.07.020] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 07/23/2018] [Accepted: 07/26/2018] [Indexed: 01/09/2023]
|
69
|
Sayyari E, Whitfield JB, Mirarab S. Fragmentary Gene Sequences Negatively Impact Gene Tree and Species Tree Reconstruction. Mol Biol Evol 2018; 34:3279-3291. [PMID: 29029241 DOI: 10.1093/molbev/msx261] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Species tree reconstruction from genome-wide data is increasingly being attempted, in most cases using a two-step approach of first estimating individual gene trees and then summarizing them to obtain a species tree. The accuracy of this approach, which promises to account for gene tree discordance, depends on the quality of the inferred gene trees. At the same time, phylogenomic and phylotranscriptomic analyses typically use involved bioinformatics pipelines for data preparation. Errors and shortcomings resulting from these preprocessing steps may impact the species tree analyses at the other end of the pipeline. In this article, we first show that the presence of fragmentary data for some species in a gene alignment, as often seen on real data, can result in substantial deterioration of gene trees, and as a result, the species tree. We then investigate a simple filtering strategy where individual fragmentary sequences are removed from individual genes but the rest of the gene is retained. Both in simulations and by reanalyzing a large insect phylotranscriptomic data set, we show the effectiveness of this simple filtering strategy.
Collapse
Affiliation(s)
- Erfan Sayyari
- Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA
| | | | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA
| |
Collapse
|
70
|
Gates DJ, Pilson D, Smith SD. Filtering of target sequence capture individuals facilitates species tree construction in the plant subtribe Iochrominae (Solanaceae). Mol Phylogenet Evol 2018; 123:26-34. [DOI: 10.1016/j.ympev.2018.02.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Revised: 01/30/2018] [Accepted: 02/01/2018] [Indexed: 10/18/2022]
|
71
|
Nute M, Chou J, Molloy EK, Warnow T. The performance of coalescent-based species tree estimation methods under models of missing data. BMC Genomics 2018; 19:286. [PMID: 29745854 PMCID: PMC5998899 DOI: 10.1186/s12864-018-4619-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer, that result in gene trees that differ from each other and from the species phylogeny. Methods to estimate species trees in the presence of gene tree discord due to incomplete lineage sorting have been developed and proved to be statistically consistent when gene tree discord is due only to incomplete lineage sorting and every gene tree includes the full set of species. RESULTS We establish statistical consistency of certain coalescent-based species tree estimation methods under some models of taxon deletion from genes. We also evaluate the impact of missing data on four species tree estimation methods (ASTRAL-II, ASTRID, MP-EST, and SVDquartets) using simulated datasets with varying levels of incomplete lineage sorting, gene tree estimation error, and degrees/patterns of missing data. CONCLUSIONS All the species tree estimation methods improved in accuracy as the number of genes increased and often produced highly accurate species trees even when the amount of missing data was large. These results together indicate that accurate species tree estimation is possible under a variety of conditions, even when there are substantial amounts of missing data.
Collapse
Affiliation(s)
- Michael Nute
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S. Wright St., Champaign, IL, 61820 USA
| | - Jed Chou
- Department of Mathematics, University of Illinois at Urbana-Champaign, 1409 W. Green St., Urbana, IL, 61801 USA
| | - Erin K. Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL, 61801 USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL, 61801 USA
| |
Collapse
|
72
|
Dobrin BH, Zwickl DJ, Sanderson MJ. The prevalence of terraced treescapes in analyses of phylogenetic data sets. BMC Evol Biol 2018; 18:46. [PMID: 29618314 PMCID: PMC5885316 DOI: 10.1186/s12862-018-1162-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2017] [Accepted: 03/22/2018] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. RESULTS Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support. CONCLUSIONS If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.
Collapse
Affiliation(s)
- Barbara H. Dobrin
- Department of Ecology and Evolutionary Biology, University of Arizona, 1041 E. Lowell St, Tucson, AZ 85721 USA
| | - Derrick J. Zwickl
- Department of Ecology and Evolutionary Biology, University of Arizona, 1041 E. Lowell St, Tucson, AZ 85721 USA
| | - Michael J. Sanderson
- Department of Ecology and Evolutionary Biology, University of Arizona, 1041 E. Lowell St, Tucson, AZ 85721 USA
| |
Collapse
|
73
|
Brower AVZ, Garzón-Orduña IJ. Missing data, clade support and "reticulation": the molecular systematics of Heliconius and related genera (Lepidoptera: Nymphalidae) re-examined. Cladistics 2018; 34:151-166. [PMID: 34645081 DOI: 10.1111/cla.12198] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/03/2017] [Indexed: 11/30/2022] Open
Abstract
Kozak et al. (2015, Syst. Biol., 64: 505) portrayed the inference of evolutionary history among Heliconius and allied butterfly genera as a particularly difficult problem for systematics due to prevalent gene conflict caused by interspecific reticulation. To control for this, Kozak et al. conducted a series of multispecies coalescent phylogenetic analyses that they claimed revealed pervasive conflict among markers, but ultimately chose as their preferred hypothesis a phylogenetic tree generated by the traditional supermatrix approach. Intrigued by this seemingly contradictory set of conclusions, we conducted further analyses focusing on two prevalent aspects of the data set: missing data and the uneven contribution of phylogenetic signal among markers. Here, we demonstrate that Kozak et al. overstated their findings of reticulation and that evidence of gene-tree conflict is largely lacking. The distribution of intrinsic homoplasy and incongruence homoplasy in their data set does not follow the pattern expected if phylogenetic history had been obscured by pervasive horizontal gene flow; in fact, noise within individual gene partitions is ten times higher than the incongruence among gene partitions. We show that the patterns explained by Kozak et al. as a result of reticulation can be accounted for by missing data and homoplasy. We also find that although the preferred topology is resilient to missing data, measures of support are sensitive to, and strongly eroded by too many empty cells in the data matrix. Perhaps more importantly, we show that when some taxa are missing almost all characters, adding more genes to the data set provides little or no increase in support for the tree.
Collapse
Affiliation(s)
- Andrew V Z Brower
- Evolution and Ecology Group, Department of Biology, Middle Tennessee State University, Murfreesboro, TN, USA
| | - Ivonne J Garzón-Orduña
- Evolution and Ecology Group, Department of Biology, Middle Tennessee State University, Murfreesboro, TN, USA
| |
Collapse
|
74
|
Christensen S, Molloy EK, Vachaspati P, Warnow T. OCTAL: Optimal Completion of gene trees in polynomial time. Algorithms Mol Biol 2018; 13:6. [PMID: 29568323 PMCID: PMC5853121 DOI: 10.1186/s13015-018-0124-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Accepted: 03/06/2018] [Indexed: 12/16/2022] Open
Abstract
Background For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desirable. Results We introduce the Optimal Tree Completion problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. We present OCTAL, an algorithm that finds an optimal solution to this problem when the distance between trees is defined using the Robinson–Foulds (RF) distance, and we prove that OCTAL runs in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(n^2)$$\end{document}O(n2) time, where n is the total number of species. We report on a simulation study in which gene trees can differ from the species tree due to incomplete lineage sorting, and estimated gene trees are completed using OCTAL with a reference tree based on a species tree estimated from the multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach in ASTRAL-II, but the accuracy of a completed gene tree computed by OCTAL depends on how topologically similar the reference tree (typically an estimated species tree) is to the true gene tree. Conclusions OCTAL is a useful technique for adding missing taxa to incomplete gene trees and provides good accuracy under a wide range of model conditions. However, results show that OCTAL’s accuracy can be reduced when incomplete lineage sorting is high, as the reference tree can be far from the true gene tree. Hence, this study suggests that OCTAL would benefit from using other types of reference trees instead of species trees when there are large topological distances between true gene trees and species trees. Electronic supplementary material The online version of this article (10.1186/s13015-018-0124-5) contains supplementary material, which is available to authorized users.
Collapse
|
75
|
Affiliation(s)
- David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| |
Collapse
|
76
|
Rodriguez J, Jones TH, Sierwald P, Marek PE, Shear WA, Brewer MS, Kocot KM, Bond JE. Step-wise evolution of complex chemical defenses in millipedes: a phylogenomic approach. Sci Rep 2018; 8:3209. [PMID: 29453332 PMCID: PMC5816663 DOI: 10.1038/s41598-018-19996-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 01/11/2018] [Indexed: 11/19/2022] Open
Abstract
With fossil representatives from the Silurian capable of respiring atmospheric oxygen, millipedes are among the oldest terrestrial animals, and likely the first to acquire diverse and complex chemical defenses against predators. Exploring the origin of complex adaptive traits is critical for understanding the evolution of Earth's biological complexity, and chemical defense evolution serves as an ideal study system. The classic explanation for the evolution of complexity is by gradual increase from simple to complex, passing through intermediate "stepping stone" states. Here we present the first phylogenetic-based study of the evolution of complex chemical defenses in millipedes by generating the largest genomic-based phylogenetic dataset ever assembled for the group. Our phylogenomic results demonstrate that chemical complexity shows a clear pattern of escalation through time. New pathways are added in a stepwise pattern, leading to greater chemical complexity, independently in a number of derived lineages. This complexity gradually increased through time, leading to the advent of three distantly related chemically complex evolutionary lineages, each uniquely characteristic of each of the respective millipede groups.
Collapse
Affiliation(s)
- Juanita Rodriguez
- Department of Biological Sciences, Auburn University, Auburn, AL, 36849, USA
- CSIRO, Australian National Insect Collection, Canberra, ACT, 2601, Australia
| | - Tappey H Jones
- Department of Chemistry, Virginia Military Institute, Lexington, VA, 24450, USA
| | - Petra Sierwald
- Zoology Department, The Field Museum, Chicago, IL, 60605, USA
| | - Paul E Marek
- Department of Entomology, Virginia Tech, Blacksburg, VA, 24061, USA
| | - William A Shear
- Biology Department, Hampden-Sydney College, Farmville, VA, 23943, USA
| | - Michael S Brewer
- Department of Biology, East Carolina University, Greenville, NC, 27858, USA
| | - Kevin M Kocot
- Department of Biological Sciences, University of Alabama, Tuscaloosa, AL, 35487, USA
| | - Jason E Bond
- Department of Biological Sciences, Auburn University, Auburn, AL, 36849, USA.
| |
Collapse
|
77
|
Blom MPK, Bragg JG, Potter S, Moritz C. Accounting for Uncertainty in Gene Tree Estimation: Summary-Coalescent Species Tree Inference in a Challenging Radiation of Australian Lizards. Syst Biol 2018; 66:352-366. [PMID: 28039387 DOI: 10.1093/sysbio/syw089] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 09/27/2016] [Indexed: 11/12/2022] Open
Abstract
Accurate gene tree inference is an important aspect of species tree estimation in a summary-coalescent framework. Yet, in empirical studies, inferred gene trees differ in accuracy due to stochastic variation in phylogenetic signal between targeted loci. Empiricists should, therefore, examine the consistency of species tree inference, while accounting for the observed heterogeneity in gene tree resolution of phylogenomic data sets. Here, we assess the impact of gene tree estimation error on summary-coalescent species tree inference by screening ${\sim}2000$ exonic loci based on gene tree resolution prior to phylogenetic inference. We focus on a phylogenetically challenging radiation of Australian lizards (genus Cryptoblepharus, Scincidae) and explore effects on topology and support. We identify a well-supported topology based on all loci and find that a relatively small number of high-resolution gene trees can be sufficient to converge on the same topology. Adding gene trees with decreasing resolution produced a generally consistent topology, and increased confidence for specific bipartitions that were poorly supported when using a small number of informative loci. This corroborates coalescent-based simulation studies that have highlighted the need for a large number of loci to confidently resolve challenging relationships and refutes the notion that low-resolution gene trees introduce phylogenetic noise. Further, our study also highlights the value of quantifying changes in nodal support across locus subsets of increasing size (but decreasing gene tree resolution). Such detailed analyses can reveal anomalous fluctuations in support at some nodes, suggesting the possibility of model violation. By characterizing the heterogeneity in phylogenetic signal among loci, we can account for uncertainty in gene tree estimation and assess its effect on the consistency of the species tree estimate. We suggest that the evaluation of gene tree resolution should be incorporated in the analysis of empirical phylogenomic data sets. This will ultimately increase our confidence in species tree estimation using summary-coalescent methods and enable us to exploit genomic data for phylogenetic inference. [Coalescence; concatenation; Cryptoblepharus; exon capture; gene tree; phylogenomics; species tree.].
Collapse
Affiliation(s)
- Mozes P K Blom
- Research School of Biology, Australian National University, Canberra ACT 0200, Australia
| | - Jason G Bragg
- Research School of Biology, Australian National University, Canberra ACT 0200, Australia
| | - Sally Potter
- Research School of Biology, Australian National University, Canberra ACT 0200, Australia
| | - Craig Moritz
- Research School of Biology, Australian National University, Canberra ACT 0200, Australia
| |
Collapse
|
78
|
Tibiriçá Y, Pola M, Cervera JL. Systematics of the genus Halgerda Bergh, 1880 (Heterobranchia : Nudibranchia) of Mozambique with descriptions of six new species. INVERTEBR SYST 2018. [DOI: 10.1071/is17095] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The species of the genus Halgerda Bergh, 1880, are restricted to the Indo-Pacific; some being common inhabitants of reefs off the coast of Mozambique. These species have been relatively well studied morphologically, but few molecular data are available. During a seven-year period surveying the reefs of Mozambique, 11 Halgerda spp. were collected, six of which are described here. We provide details on their morphology, anatomy, novel genetic markers and additional information about their colour variation. The new species described herein are Halgerda leopardalis, sp. nov., H. mozambiquensis, sp. nov., H. jennyae, sp. nov., H. meringuecitrea, sp. nov., H. nuarroensis, sp. nov. and H. indotessellata, sp. nov., the last of which was found to be a pseudocryptic species of H. tessellata. Moreover, we identified two species complexes, one composed mainly of specimens from the Western Indian Ocean and another with specimens mostly from the Pacific Ocean and Western Australia.
Collapse
|
79
|
Damerau M, Freese M, Hanel R. Multi-gene phylogeny of jacks and pompanos (Carangidae), including placement of monotypic vadigo Campogramma glaycos. JOURNAL OF FISH BIOLOGY 2018; 92:190-202. [PMID: 29193148 DOI: 10.1111/jfb.13509] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 10/27/2017] [Indexed: 06/07/2023]
Abstract
In this study, the phylogenetic trees of jacks and pompanos (Carangidae), an ecologically and morphologically diverse, globally distributed fish family, are inferred from a complete, concatenated data set of two mitochondrial (cytochrome c oxidase I, cytochrome b) loci and one nuclear (myosin heavy chain 6) locus. Maximum likelihood and Bayesian inferences are largely congruent and show a clear separation of Carangidae into the four subfamilies: Scomberoidinae, Trachinotinae, Naucratinae and Caranginae. The inclusion of the carangid sister lineages Coryphaenidae (dolphinfishes) and Rachycentridae (cobia), however, render Carangidae paraphyletic. The phylogenetic trees also show with high statistical support that the monotypic vadigo Campogramma glaycos is the sister to all other species within the Naucratinae.
Collapse
Affiliation(s)
- M Damerau
- Johann Heinrich von Thünen Institute, Thünen Institute of Fisheries Ecology, Palmaille 9, 22767, Hamburg, Germany
| | - M Freese
- Johann Heinrich von Thünen Institute, Thünen Institute of Fisheries Ecology, Palmaille 9, 22767, Hamburg, Germany
| | - R Hanel
- Johann Heinrich von Thünen Institute, Thünen Institute of Fisheries Ecology, Palmaille 9, 22767, Hamburg, Germany
| |
Collapse
|
80
|
Mallo D, Posada D. Multilocus inference of species trees and DNA barcoding. Philos Trans R Soc Lond B Biol Sci 2017; 371:rstb.2015.0335. [PMID: 27481787 PMCID: PMC4971187 DOI: 10.1098/rstb.2015.0335] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/10/2016] [Indexed: 11/30/2022] Open
Abstract
The unprecedented amount of data resulting from next-generation sequencing has opened a new era in phylogenetic estimation. Although large datasets should, in theory, increase phylogenetic resolution, massive, multilocus datasets have uncovered a great deal of phylogenetic incongruence among different genomic regions, due both to stochastic error and to the action of different evolutionary process such as incomplete lineage sorting, gene duplication and loss and horizontal gene transfer. This incongruence violates one of the fundamental assumptions of the DNA barcoding approach, which assumes that gene history and species history are identical. In this review, we explain some of the most important challenges we will have to face to reconstruct the history of species, and the advantages and disadvantages of different strategies for the phylogenetic analysis of multilocus data. In particular, we describe the evolutionary events that can generate species tree—gene tree discordance, compare the most popular methods for species tree reconstruction, highlight the challenges we need to face when using them and discuss their potential utility in barcoding. Current barcoding methods sacrifice a great amount of statistical power by only considering one locus, and a transition to multilocus barcodes would not only improve current barcoding methods, but also facilitate an eventual transition to species-tree-based barcoding strategies, which could better accommodate scenarios where the barcode gap is too small or inexistent. This article is part of the themed issue ‘From DNA barcodes to biomes’.
Collapse
Affiliation(s)
- Diego Mallo
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
| |
Collapse
|
81
|
|
82
|
Molloy EK, Warnow T. To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods. Syst Biol 2017; 67:285-303. [DOI: 10.1093/sysbio/syx077] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 09/13/2017] [Indexed: 01/27/2023] Open
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
83
|
Kates HR, Soltis PS, Soltis DE. Evolutionary and domestication history of Cucurbita (pumpkin and squash) species inferred from 44 nuclear loci. Mol Phylogenet Evol 2017; 111:98-109. [PMID: 28288944 DOI: 10.1016/j.ympev.2017.03.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Revised: 02/28/2017] [Accepted: 03/01/2017] [Indexed: 11/28/2022]
Abstract
Phylogenetics can facilitate the study of plant domestication by resolving sister relationships between crops and their wild relatives, thereby identifying the ancestors of cultivated plants. Previous phylogenetic studies of the six Cucurbita crop lineages (pumpkins and squashes) and their wild relatives suggest histories of deep coalescence that complicate uncovering the genetic origins of the six crop taxa. We investigated the evolution of wild and domesticated Cucurbita using the most comprehensive and robust molecular-based phylogeny for Cucurbita to date based on 44 loci derived from introns of single-copy nuclear genes. We discovered novel relationships among Cucurbita species and recovered the first Cucurbita tree with well-supported resolution within species. Cucurbita comprises a clade of mesophytic annual species that includes all six crop taxa and a grade of xerophytic perennial species that represent the ancestral xerophytic habit of the genus. Based on phylogenetic resolution within-species we hypothesize that the magnitude of domestication bottlenecks varies among Cucurbita crop lineages. Our phylogeny clarifies how wild Cucurbita species are related to the domesticated taxa. We find close relationships between two wild species and crop lineages not previously identified. Expanded geographic sampling of key wild species is needed for improved understanding of the evolution of domesticated Cucurbita.
Collapse
Affiliation(s)
- Heather R Kates
- Univ Florida, Genet Inst, Gainesville, FL 32611, USA; Univ Florida, Florida Museum Nat Hist, Gainesville, FL 32611, USA.
| | - Pamela S Soltis
- Univ Florida, Genet Inst, Gainesville, FL 32611, USA; Univ Florida, Florida Museum Nat Hist, Gainesville, FL 32611, USA
| | - Douglas E Soltis
- Univ Florida, Genet Inst, Gainesville, FL 32611, USA; Univ Florida, Florida Museum Nat Hist, Gainesville, FL 32611, USA; Univ Florida, Dept Biol, Gainesville, FL 32611, USA
| |
Collapse
|
84
|
Li X, Jang TS, Temsch EM, Kato H, Takayama K, Schneeweiss GM. Molecular and karyological data confirm that the enigmatic genus Platypholis from Bonin-Islands (SE Japan) is phylogenetically nested within Orobanche (Orobanchaceae). JOURNAL OF PLANT RESEARCH 2017; 130:273-280. [PMID: 28004281 PMCID: PMC5318490 DOI: 10.1007/s10265-016-0888-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 10/26/2016] [Indexed: 05/17/2023]
Abstract
Molecular phylogenetic studies have greatly improved our understanding of phylogenetic relationships of non-photosynthetic parasitic broomrapes (Orobanche and related genera, Orobanchaceae), but a few genera have remained unstudied. One of those is Platypholis, whose sole species, Platypholis boninsimae, is restricted to the Bonin-Islands (Ogasawara Islands) about 1000 km southeast of Japan. Based on overall morphological similarity, Platypholis has been merged with Orobanche, but this hypothesis has never been tested with molecular data. Employing maximum likelihood and Bayesian analyses on a family-wide data set (two plastid markers, matK and rps2, and three nuclear markers, ITS, phyA and phyB) as well as on an ITS data set focusing on Orobanche s. str., it is shown that P. boninsimae Maxim. is phylogenetically closely linked to or even nested within Orobanche s. str. This position is supported both by morphological evidence and by the newly obtained chromosome number of 2n = 38, which is characteristic for the genus Orobanche s. str.
Collapse
Affiliation(s)
- Xi Li
- Department of Botany and Biodiversity Research, University of Vienna, Rennweg 14, 1030, Vienna, Austria
| | - Tae-Soo Jang
- Department of Botany and Biodiversity Research, University of Vienna, Rennweg 14, 1030, Vienna, Austria
| | - Eva M Temsch
- Department of Botany and Biodiversity Research, University of Vienna, Rennweg 14, 1030, Vienna, Austria
| | - Hidetoshi Kato
- Makino Herbarium, Tokyo Metropolitan University, 1-1 Minami-Ohsawa, Hachioji-shi, Tokyo, 192-0397, Japan
| | - Koji Takayama
- Museum of Natural and Environmental History, Shizuoka, 5762 Oya, Suruga-ku, Shizuoka-shi, Shizuoka, 422-8017, Japan
| | - Gerald M Schneeweiss
- Department of Botany and Biodiversity Research, University of Vienna, Rennweg 14, 1030, Vienna, Austria.
| |
Collapse
|
85
|
Li X, Hao B, Pan D, Schneeweiss GM. Marker Development for Phylogenomics: The Case of Orobanchaceae, a Plant Family with Contrasting Nutritional Modes. FRONTIERS IN PLANT SCIENCE 2017; 8:1973. [PMID: 29218053 PMCID: PMC5704539 DOI: 10.3389/fpls.2017.01973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 11/01/2017] [Indexed: 05/02/2023]
Abstract
Phylogenomic approaches, employing next-generation sequencing (NGS) techniques, have revolutionized systematic and evolutionary biology. Target enrichment is an efficient and cost-effective method in phylogenomics and is becoming increasingly popular. Depending on availability and quality of reference data as well as on biological features of the study system, (semi-)automated identification of suitable markers will require specific bioinformatic pipelines. Here, we established a highly flexible bioinformatic pipeline, BaitsFinder, to identify putative orthologous single copy genes (SCGs) and to construct bait sequences in a single workflow. Additionally, this pipeline has been constructed to be able to cope with challenging data sets, such as the nutritionally heterogeneous plant family Orobanchaceae. To this end, we used transcriptome data of differing quality available for four Orobanchaceae species and, as reference, SCG data from monkeyflower (Erythranthe guttata, syn. Mimulus g.; 1,915 genes) and tomato (Solanum lycopersicum; 391 genes). Depending on whether gaps were permitted in initial blast searches of the four Orobanchaceae species against the reference, our pipeline identified 1,307 and 981 SCGs with average length of 994 bp and 775 bp, respectively. Automated bait sequence construction (using 2× tiling) resulted in 38,170 and 21,856 bait sequences, respectively. In comparison to the recently published MarkerMiner 1.0 pipeline BaitsFinder identified about 1.6 times as many SCGs (of at least 900 bp length). Skipping steps specific to analyses of Orobanchaceae, BaitsFinder was successfully used in a group of non-parasitic plants (three Asteraceae species and, as reference, SCG data from Arabidopsis thaliana based on previously compiled SCGs). Thus, BaitsFinder is expected to be broadly applicable in groups, where only transcriptomes or partial genome data of differing quality are available.
Collapse
|
86
|
Shen XX, Zhou X, Kominek J, Kurtzman CP, Hittinger CT, Rokas A. Reconstructing the Backbone of the Saccharomycotina Yeast Phylogeny Using Genome-Scale Data. G3 (BETHESDA, MD.) 2016; 6:3927-3939. [PMID: 27672114 PMCID: PMC5144963 DOI: 10.1534/g3.116.034744] [Citation(s) in RCA: 134] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 09/21/2016] [Indexed: 01/20/2023]
Abstract
Understanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multilocus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing nine of the 11 known major lineages and 10 nonyeast fungal outgroups to generate a 1233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the nine major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. However, eight of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and the species Ascoidea rubescens These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.
Collapse
Affiliation(s)
- Xing-Xing Shen
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37235
| | - Xiaofan Zhou
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37235
| | - Jacek Kominek
- Laboratory of Genetics, Genome Center of Wisconsin, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, Wisconsin 53706
| | - Cletus P Kurtzman
- Mycotoxin Prevention and Applied Microbiology Research Unit, National Center for Agricultural Utilization Research, Agricultural Research Service, U.S. Department of Agriculture, Peoria, Illinois 61604
| | - Chris Todd Hittinger
- Laboratory of Genetics, Genome Center of Wisconsin, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, Wisconsin 53706
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee 37235
| |
Collapse
|
87
|
Zhao L, Li X, Zhang N, Zhang SD, Yi TS, Ma H, Guo ZH, Li DZ. Phylogenomic analyses of large-scale nuclear genes provide new insights into the evolutionary relationships within the rosids. Mol Phylogenet Evol 2016; 105:166-176. [DOI: 10.1016/j.ympev.2016.06.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Revised: 06/06/2016] [Accepted: 06/27/2016] [Indexed: 12/28/2022]
|
88
|
Arbizu CI, Ellison SL, Senalik D, Simon PW, Spooner DM. Genotyping-by-sequencing provides the discriminating power to investigate the subspecies of Daucus carota (Apiaceae). BMC Evol Biol 2016; 16:234. [PMID: 27793080 PMCID: PMC5084430 DOI: 10.1186/s12862-016-0806-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 10/14/2016] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND The majority of the subspecies of Daucus carota have not yet been discriminated clearly by various molecular or morphological methods and hence their phylogeny and classification remains unresolved. Recent studies using 94 nuclear orthologs and morphological characters, and studies employing other molecular approaches were unable to distinguish clearly many of the subspecies. Fertile intercrosses among traditionally recognized subspecies are well documented. We here explore the utility of single nucleotide polymorphisms (SNPs) generated by genotyping-by-sequencing (GBS) to serve as an effective molecular method to discriminate the subspecies of the D. carota complex. RESULTS We used GBS to obtain SNPs covering all nine Daucus carota chromosomes from 162 accessions of Daucus and two related genera. To study Daucus phylogeny, we scored a total of 10,814 or 38,920 SNPs with a maximum of 10 or 30 % missing data, respectively. To investigate the subspecies of D. carota, we employed two data sets including 150 accessions: (i) rate of missing data 10 % with a total of 18,565 SNPs, and (ii) rate of missing data 30 %, totaling 43,713 SNPs. Consistent with prior results, the topology of both data sets separated species with 2n = 18 chromosome from all other species. Our results place all cultivated carrots (D. carota subsp. sativus) in a single clade. The wild members of D. carota from central Asia were on a clade with eastern members of subsp. sativus. The other subspecies of D. carota were in four clades associated with geographic groups: (1) the Balkan Peninsula and the Middle East, (2) North America and Europe, (3) North Africa exclusive of Morocco, and (4) the Iberian Peninsula and Morocco. Daucus carota subsp. maximus was discriminated, but neither it, nor subsp. gummifer (defined in a broad sense) are monophyletic. CONCLUSIONS Our study suggests that (1) the morphotypes identified as D. carota subspecies gummifer (as currently broadly circumscribed), all confined to areas near the Atlantic Ocean and the western Mediterranean Sea, have separate origins from sympatric members of other subspecies of D. carota, (2) D. carota subsp. maximus, on two clades with some accessions of subsp. carota, can be distinguished from each other but only with poor morphological support, (3) D. carota subsp. capillifolius, well distinguished morphologically, is an apospecies relative to North African populations of D. carota subsp. carota, (4) the eastern cultivated carrots have origins closer to wild carrots from central Asia than to western cultivated carrots, and (5) large SNP data sets are suitable for species-level phylogenetic studies in Daucus.
Collapse
Affiliation(s)
- Carlos I Arbizu
- Department of Horticulture, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI, 53706-1590, USA
| | - Shelby L Ellison
- Department of Horticulture, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI, 53706-1590, USA
| | - Douglas Senalik
- Department of Horticulture, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI, 53706-1590, USA
- USDA-Agricultural Research Service, Vegetable Crops Research Unit, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI, 53706-1590, USA
| | - Philipp W Simon
- Department of Horticulture, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI, 53706-1590, USA
- USDA-Agricultural Research Service, Vegetable Crops Research Unit, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI, 53706-1590, USA
| | - David M Spooner
- Department of Horticulture, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI, 53706-1590, USA.
- USDA-Agricultural Research Service, Vegetable Crops Research Unit, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI, 53706-1590, USA.
| |
Collapse
|
89
|
Affiliation(s)
- Scott V. Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology Harvard University Cambridge MA 02138 USA
| |
Collapse
|
90
|
Gatesy J, Meredith RW, Janecka JE, Simmons MP, Murphy WJ, Springer MS. Resolution of a concatenation/coalescence kerfuffle: partitioned coalescence support and a robust family‐level tree for Mammalia. Cladistics 2016; 33:295-332. [DOI: 10.1111/cla.12170] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/30/2016] [Indexed: 12/14/2022] Open
Affiliation(s)
- John Gatesy
- Department of Biology University of California Riverside CA 92521 USA
| | - Robert W. Meredith
- Department of Biology and Molecular Biology Montclair State University Montclair NJ 07043 USA
| | - Jan E. Janecka
- Department of Biological Sciences Duquesne University Pittsburgh PA 15282 USA
| | - Mark P. Simmons
- Department of Biology Colorado State University Fort Collins CO 80523 USA
| | - William J. Murphy
- Department of Veterinary Integrative Biosciences Texas A&M University College Station TX 77843 USA
| | - Mark S. Springer
- Department of Biology University of California Riverside CA 92521 USA
| |
Collapse
|
91
|
Wu HY, Wang YH, Xie Q, Ke YL, Bu WJ. Molecular classification based on apomorphic amino acids (Arthropoda, Hexapoda): Integrative taxonomy in the era of phylogenomics. Sci Rep 2016; 6:28308. [PMID: 27312960 PMCID: PMC4911608 DOI: 10.1038/srep28308] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 05/31/2016] [Indexed: 11/10/2022] Open
Abstract
With the great development of sequencing technologies and systematic methods, our understanding of evolutionary relationships at deeper levels within the tree of life has greatly improved over the last decade. However, the current taxonomic methodology is insufficient to describe the growing levels of diversity in both a standardised and general way due to the limitations of using only morphological traits to describe clades. Herein, we propose the idea of a molecular classification based on hierarchical and discrete amino acid characters. Clades are classified based on the results of phylogenetic analyses and described using amino acids with group specificity in phylograms. Practices based on the recently published phylogenomic datasets of insects together with 15 de novo sequenced transcriptomes in this study demonstrate that such a methodology can accommodate various higher ranks of taxonomy. Such an approach has the advantage of describing organisms in a standard and discrete way within a phylogenetic framework, thereby facilitating the recognition of clades from the view of the whole lineage, as indicated by PhyloCode. By combining identification keys and phylogenies, the molecular classification based on hierarchical and discrete characters may greatly boost the progress of integrative taxonomy.
Collapse
Affiliation(s)
- Hao-Yang Wu
- Institute of Entomology, College of Life Sciences, Nankai University, Tianjin 300071, China
| | - Yan-Hui Wang
- Institute of Entomology, College of Life Sciences, Nankai University, Tianjin 300071, China
- College of Computer and Control Engineering, Nankai University, 38 Tongyan Road, Haihe Education Park, Jinnan District, Tianjin 300350, China
| | - Qiang Xie
- Institute of Entomology, College of Life Sciences, Nankai University, Tianjin 300071, China
| | - Yun-Ling Ke
- Guangdong Entomological Institute, Guangzhou 510260, China
| | - Wen-Jun Bu
- Institute of Entomology, College of Life Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|