1
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
2
|
Juravel K, Porras L, Höhna S, Pisani D, Wörheide G. Exploring genome gene content and morphological analysis to test recalcitrant nodes in the animal phylogeny. PLoS One 2023; 18:e0282444. [PMID: 36952565 PMCID: PMC10035847 DOI: 10.1371/journal.pone.0282444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 02/14/2023] [Indexed: 03/25/2023] Open
Abstract
An accurate phylogeny of animals is needed to clarify their evolution, ecology, and impact on shaping the biosphere. Although datasets of several hundred thousand amino acids are nowadays routinely used to test phylogenetic hypotheses, key deep nodes in the metazoan tree remain unresolved: the root of animals, the root of Bilateria, and the monophyly of Deuterostomia. Instead of using the standard approach of amino acid datasets, we performed analyses of newly assembled genome gene content and morphological datasets to investigate these recalcitrant nodes in the phylogeny of animals. We explored extensively the choices for assembling the genome gene content dataset and model choices of morphological analyses. Our results are robust to these choices and provide additional insights into the early evolution of animals, they are consistent with sponges as the sister group of all the other animals, the worm-like bilaterian lineage Xenacoelomorpha as the sister group of the other Bilateria, and tentatively support monophyletic Deuterostomia.
Collapse
Affiliation(s)
- Ksenia Juravel
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, München, Germany
| | - Luis Porras
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, München, Germany
| | - Sebastian Höhna
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, München, Germany
- GeoBio-Center, Ludwig-Maximilians-Universität München, München, Germany
| | - Davide Pisani
- Bristol Palaeobiology Group, School of Biological Sciences and School of Earth Sciences, University of Bristol, Bristol, United Kingdom
| | - Gert Wörheide
- Department of Earth and Environmental Sciences, Paleontology & Geobiology, Ludwig-Maximilians-Universität München, München, Germany
- GeoBio-Center, Ludwig-Maximilians-Universität München, München, Germany
- SNSB-Bayerische Staatssammlung für Paläontologie und Geologie, München, Germany
| |
Collapse
|
3
|
Hovlinc is a recently evolved class of ribozyme found in human lncRNA. Nat Chem Biol 2021; 17:601-607. [PMID: 33753927 DOI: 10.1038/s41589-021-00763-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 01/06/2021] [Accepted: 02/02/2021] [Indexed: 01/31/2023]
Abstract
Although naturally occurring catalytic RNA molecules-ribozymes-have attracted a great deal of research interest, very few have been identified in humans. Here, we developed a genome-wide approach to discovering self-cleaving ribozymes and identified a naturally occurring ribozyme in humans. The secondary structure and biochemical properties of this ribozyme indicate that it belongs to an unidentified class of small, self-cleaving ribozymes. The sequence of the ribozyme exhibits a clear evolutionary path, from its appearance between ~130 and ~65 million years ago (Ma), to acquiring self-cleavage activity very recently, ~13-10 Ma, in the common ancestors of humans, chimpanzees and gorillas. The ribozyme appears to be functional in vivo and is embedded within a long noncoding RNA belonging to a class of very long intergenic noncoding RNAs. The presence of a catalytic RNA enzyme in lncRNA creates the possibility that these transcripts could function by carrying catalytic RNA domains.
Collapse
|
4
|
Kimball RT, Hosner PA, Braun EL. A phylogenomic supermatrix of Galliformes (Landfowl) reveals biased branch lengths. Mol Phylogenet Evol 2021; 158:107091. [PMID: 33545275 DOI: 10.1016/j.ympev.2021.107091] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 01/16/2021] [Accepted: 01/27/2021] [Indexed: 11/25/2022]
Abstract
Building taxon-rich phylogenies is foundational for macroevolutionary studies. One approach to improve taxon sampling beyond individual studies is to build supermatricies of publicly available data, incorporating taxa sampled across different studies and utilizing different loci. Most existing supermatrix studies have focused on loci commonly sequenced with Sanger technology ("legacy" markers, such as mitochondrial data and small numbers of nuclear loci). However, incorporating phylogenomic studies into supermatrices allows problem nodes to be targeted and resolved with considerable amounts of data, while improving taxon sampling with legacy data. Here we estimate phylogeny from a galliform supermatrix which includes well-known model and agricultural species such as the chicken and turkey. We assembled a supermatrix comprising 4500 ultra-conserved elements (UCEs) collected as part of recent phylogenomic studies in this group and legacy mitochondrial and nuclear (intron and exon) sequences. Our resulting phylogeny included 88% of extant species and recovered well-accepted relationships with strong support. However, branch lengths, which are particularly important in down-stream macroevolutionary studies, appeared vastly skewed. Taxa represented only by rapidly evolving mitochondrial data had high proportions of missing data and exhibited long terminal branches. Conversely, taxa sampled for slowly evolving UCEs with low proportions of missing data exhibited substantially shorter terminal branches. We explored several branch length re-estimation methods with particular attention to terminal branches and conclude that re-estimation using well-sampled mitochondrial sequences may be a pragmatic approach to obtain trees suitable for macroevolutionary analysis.
Collapse
Affiliation(s)
- Rebecca T Kimball
- Department of Biology, University of Florida, Gainesville, FL 32607, USA.
| | - Peter A Hosner
- Department of Biology, University of Florida, Gainesville, FL 32607, USA; Natural History Museum of Denmark and Center for Macroecology, Evolution and Climate, University of Copenhagen, Copenhagen, Denmark
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL 32607, USA
| |
Collapse
|
5
|
Grant T. Outgroup sampling in phylogenetics: Severity of test and successive outgroup expansion. J ZOOL SYST EVOL RES 2019. [DOI: 10.1111/jzs.12317] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Taran Grant
- Department of Zoology, Institute of Biosciences University of São Paulo São Paulo Brazil
| |
Collapse
|
6
|
Abstract
It has long been appreciated that analyses of genomic data (e.g., whole genome sequencing or sequence capture) have the potential to reveal the tree of life, but it remains challenging to move from sequence data to a clear understanding of evolutionary history, in part due to the computational challenges of phylogenetic estimation using genome-scale data. Supertree methods solve that challenge because they facilitate a divide-and-conquer approach for large-scale phylogeny inference by integrating smaller subtrees in a computationally efficient manner. Here, we combined information from sequence capture and whole-genome phylogenies using supertree methods. However, the available phylogenomic trees had limited overlap so we used taxon-rich (but not phylogenomic) megaphylogenies to weave them together. This allowed us to construct a phylogenomic supertree, with support values, that included 707 bird species (~7% of avian species diversity). We estimated branch lengths using mitochondrial sequence data and we used these branch lengths to estimate divergence times. Our time-calibrated supertree supports radiation of all three major avian clades (Palaeognathae, Galloanseres, and Neoaves) near the Cretaceous-Paleogene (K-Pg) boundary. The approach we used will permit the continued addition of taxa to this supertree as new phylogenomic data are published, and it could be applied to other taxa as well.
Collapse
|
7
|
Casagranda MD, Goloboff PA. On stability measures and effects of data structure in the recognition of areas of endemism. Biol J Linn Soc Lond 2019. [DOI: 10.1093/biolinnean/blz019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- M Dolores Casagranda
- Unidad Ejecutora Lillo/CONICET, Miguel Lillo, San Miguel de Tucumán (CP), Tucumán, Argentina
| | - Pablo A Goloboff
- Unidad Ejecutora Lillo/CONICET, Miguel Lillo, San Miguel de Tucumán (CP), Tucumán, Argentina
| |
Collapse
|
8
|
Jamil HM. Optimizing Phylogenetic Queries for Performance. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1692-1705. [PMID: 28858810 DOI: 10.1109/tcbb.2017.2743706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The vast majority of phylogenetic databases do not support declarative querying using which their contents can be flexibly and conveniently accessed and the template based query interfaces they support do not allow arbitrary speculative queries. They therefore also do not support query optimization leveraging unique phylogeny properties. While a small number of graph query languages such as XQuery, Cypher, and GraphQL exist for computer savvy users, most are too general and complex to be useful for biologists, and too inefficient for large phylogeny querying. In this paper, we discuss a recently introduced visual query language, called PhyQL, that leverages phylogeny specific properties to support essential and powerful constructs for a large class of phylogentic queries. We develop a range of pruning aids, and propose a substantial set of query optimization strategies using these aids suitable for large phylogeny querying. A hybrid optimization technique that exploits a set of indices and "graphlet" partitioning is discussed. A "fail soonest" strategy is used to avoid hopeless processing and is shown to produce dividends. Possible novel optimization techniques yet to be explored are also discussed.
Collapse
|
9
|
Smith SA, Brown JW. Constructing a broadly inclusive seed plant phylogeny. AMERICAN JOURNAL OF BOTANY 2018; 105:302-314. [PMID: 29746720 DOI: 10.1002/ajb2.1019] [Citation(s) in RCA: 368] [Impact Index Per Article: 61.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 10/19/2017] [Indexed: 05/03/2023]
Abstract
PREMISE OF THE STUDY Large phylogenies can help shed light on macroevolutionary patterns that inform our understanding of fundamental processes that shape the tree of life. These phylogenies also serve as tools that facilitate other systematic, evolutionary, and ecological analyses. Here we combine genetic data from public repositories (GenBank) with phylogenetic data (Open Tree of Life project) to construct a dated phylogeny for seed plants. METHODS We conducted a hierarchical clustering analysis of publicly available molecular data for major clades within the Spermatophyta. We constructed phylogenies of major clades, estimated divergence times, and incorporated data from the Open Tree of Life project, resulting in a seed plant phylogeny. We estimated diversification rates, excluding those taxa without molecular data. We also summarized topological uncertainty and data overlap for each major clade. KEY RESULTS The trees constructed for Spermatophyta consisted of 79,881 and 353,185 terminal taxa; the latter included the Open Tree of Life taxa for which we could not include molecular data from GenBank. The diversification analyses demonstrated nested patterns of rate shifts throughout the phylogeny. Data overlap and inference uncertainty show significant variation throughout and demonstrate the continued need for data collection across seed plants. CONCLUSIONS This study demonstrates a means for combining available resources to construct a dated phylogeny for plants. However, this approach is an early step and more developments are needed to add data, better incorporating underlying uncertainty, and improve resolution. The methods discussed here can also be applied to other major clades in the tree of life.
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109, USA
| | - Joseph W Brown
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109, USA
| |
Collapse
|
10
|
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. The abundance of genomic data for an enormous variety of organisms has enabled phylogenomic inference of many groups, and this has motivated the development of many computer programs implementing the associated methods. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - Joaquim Martins
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - João C Setubal
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil.
| |
Collapse
|
11
|
Bohlin L, Cárdenas P, Backlund A, Göransson U. 35 Years of Marine Natural Product Research in Sweden: Cool Molecules and Models from Cold Waters. PROGRESS IN MOLECULAR AND SUBCELLULAR BIOLOGY 2017; 55:1-34. [PMID: 28238034 DOI: 10.1007/978-3-319-51284-6_1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Currents efforts in marine biodiscovery have essentially focused on temperate to tropical shallow water organisms. With more than 6000 species of marine plants and animals, the Kosterfjord area has the richest marine biodiversity in Swedish waters, but it remains understudied. The overall objective of our marine pharmacognosy research is to explore and reveal the pharmacological potential of organisms from this poorly explored region. More generally, we wish to understand aspects of structure-activity relationships of chemical interactions in cold-water marine environment (shallow and deep). Our strategy is based on ecologically guided search for compounds through studies of physiology and organism interactions coupled to identification of bioactive molecules guided by especially in vivo assays. The research programme originated in the beginning of the 1980s with a broad screening of Swedish marine organisms using both in vitro and in vivo assays, resulting in isolation and identification of several different bioactive molecules. Two congenerous cyclopeptides, i.e. barettin and 8,9-dihydrobarettin, were isolated from the deep-sea sponge Geodia barretti, and structurally elucidated, guided by their antifouling activity and their affinity to a selection of human serotonin receptors. To optimize the activity a number of analogues of barettin were synthezised and tested for antifouling activity. Within the EU project BlueGenics, two larger homologous peptides, barrettides A and B, were isolated from G. baretti. Also, metabolic fingerprinting combined with sponge systematics was used to further study deep-sea natural product diversity in the genus Geodia. Finally, the chemical property space model 'ChemGPS-NP' has been developed and used in our research group, enabling a more efficient use of obtained compounds and exploration of possible biological activities and targets. Another approach is the broad application of phylogenetic frameworks, which can be used in prediction of where-in which organisms-to search for novel molecules or better sources of known molecules in marine organisms. In a further perspective, the deeper understanding of evolution and development of life on Earth can also provide answers to why marine organisms produce specific molecules.
Collapse
Affiliation(s)
- Lars Bohlin
- Division of Pharmacognosy, Department of Medicinal Chemistry, Biomedical Center, Uppsala University, Box 574, 751 23, Uppsala, Sweden.
| | - Paco Cárdenas
- Division of Pharmacognosy, Department of Medicinal Chemistry, Biomedical Center, Uppsala University, Box 574, 751 23, Uppsala, Sweden
| | - Anders Backlund
- Division of Pharmacognosy, Department of Medicinal Chemistry, Biomedical Center, Uppsala University, Box 574, 751 23, Uppsala, Sweden
| | - Ulf Göransson
- Division of Pharmacognosy, Department of Medicinal Chemistry, Biomedical Center, Uppsala University, Box 574, 751 23, Uppsala, Sweden.
| |
Collapse
|
12
|
Laing AM, Doyle S, Gold MEL, Nesbitt SJ, O'Leary MA, Turner AH, Wilberg EW, Poole KE. Giant taxon-character matrices: the future of morphological systematics. Cladistics 2017; 34:333-335. [DOI: 10.1111/cla.12197] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2017] [Indexed: 11/28/2022] Open
Affiliation(s)
- Adam M. Laing
- Department of Anatomical Sciences; HSC T-8 (040); Stony Brook University; Stony Brook NY 11794-8081 USA
| | - Sharon Doyle
- Interdepartmental Doctoral Program in Anthropological Sciences; Social and Behavioral Sciences Building; Stony Brook University; Stony Brook NY 11794-4364 USA
| | - Maria Eugenia Leone Gold
- Department of Anatomical Sciences; HSC T-8 (040); Stony Brook University; Stony Brook NY 11794-8081 USA
| | - Sterling J. Nesbitt
- Department of Geosciences; Derring Hall; Virginia Polytechnic Institute and State University; Blacksburg VA 24061 USA
| | - Maureen A. O'Leary
- Department of Anatomical Sciences; HSC T-8 (040); Stony Brook University; Stony Brook NY 11794-8081 USA
| | - Alan H. Turner
- Department of Anatomical Sciences; HSC T-8 (040); Stony Brook University; Stony Brook NY 11794-8081 USA
| | - Eric W. Wilberg
- Department of Anatomical Sciences; HSC T-8 (040); Stony Brook University; Stony Brook NY 11794-8081 USA
| | - Karen E. Poole
- Department of Anatomical Sciences; HSC T-8 (040); Stony Brook University; Stony Brook NY 11794-8081 USA
| |
Collapse
|
13
|
Samuels ME, Regnault S, Hutchinson JR. Evolution of the patellar sesamoid bone in mammals. PeerJ 2017; 5:e3103. [PMID: 28344905 PMCID: PMC5363259 DOI: 10.7717/peerj.3103] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 02/17/2017] [Indexed: 12/22/2022] Open
Abstract
The patella is a sesamoid bone located in the major extensor tendon of the knee joint, in the hindlimb of many tetrapods. Although numerous aspects of knee morphology are ancient and conserved among most tetrapods, the evolutionary occurrence of an ossified patella is highly variable. Among extant (crown clade) groups it is found in most birds, most lizards, the monotreme mammals and almost all placental mammals, but it is absent in most marsupial mammals as well as many reptiles. Here, we integrate data from the literature and first-hand studies of fossil and recent skeletal remains to reconstruct the evolution of the mammalian patella. We infer that bony patellae most likely evolved between four and six times in crown group Mammalia: in monotremes, in the extinct multituberculates, in one or more stem-mammal genera outside of therian or eutherian mammals and up to three times in therian mammals. Furthermore, an ossified patella was lost several times in mammals, not including those with absent hindlimbs: once or more in marsupials (with some re-acquisition) and at least once in bats. Our inferences about patellar evolution in mammals are reciprocally informed by the existence of several human genetic conditions in which the patella is either absent or severely reduced. Clearly, development of the patella is under close genomic control, although its responsiveness to its mechanical environment is also important (and perhaps variable among taxa). Where a bony patella is present it plays an important role in hindlimb function, especially in resisting gravity by providing an enhanced lever system for the knee joint. Yet the evolutionary origins, persistence and modifications of a patella in diverse groups with widely varying habits and habitats-from digging to running to aquatic, small or large body sizes, bipeds or quadrupeds-remain complex and perplexing, impeding a conclusive synthesis of form, function, development and genetics across mammalian evolution. This meta-analysis takes an initial step toward such a synthesis by collating available data and elucidating areas of promising future inquiry.
Collapse
Affiliation(s)
- Mark E. Samuels
- Department of Medicine, University of Montreal, Montreal, QC, Canada
- Centre de Recherche du CHU Ste-Justine, Montreal, QC, Canada
| | - Sophie Regnault
- Department of Comparative Biomedical Sciences, Structure and Motion Laboratory, The Royal Veterinary College, London Hertfordshire, UK
| | - John R. Hutchinson
- Department of Comparative Biomedical Sciences, Structure and Motion Laboratory, The Royal Veterinary College, London Hertfordshire, UK
| |
Collapse
|
14
|
Studying the evolutionary significance of thermal adaptation in ectotherms: The diversification of amphibians' energetics. J Therm Biol 2016; 68:5-13. [PMID: 28689721 DOI: 10.1016/j.jtherbio.2016.11.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 11/04/2016] [Accepted: 11/16/2016] [Indexed: 11/21/2022]
Abstract
A fundamental problem in evolutionary biology is the understanding of the factors that promote or constrain adaptive evolution, and assessing the role of natural selection in this process. Here, comparative phylogenetics, that is, using phylogenetic information and traits to infer evolutionary processes has been a major paradigm . In this study, we discuss Ornstein-Uhlenbeck models (OU) in the context of thermal adaptation in ectotherms. We specifically applied this approach to study amphibians's evolution and energy metabolism. It has been hypothesized that amphibians exploit adaptive zones characterized by low energy expenditure, which generate specific predictions in terms of the patterns of diversification in standard metabolic rate (SMR). We complied whole-animal metabolic rates for 122 species of amphibians, and adjusted several models of diversification. According to the adaptive zone hypothesis, we expected: (1) to find "accelerated evolution" in SMR (i.e., diversification above Brownian Motion expectations, BM), (2) that a model assuming evolutionary optima (i.e., an OU model) fits better than a white-noise model and (3) that a model assuming multiple optima (according to the three amphibians's orders) fits better than a model assuming a single optimum. As predicted, we found that the diversification of SMR occurred most of the time, above BM expectations. Also, we found that a model assuming an optimum explained the data in a better way than a white-noise model. However, we did not find evidence that an OU model with multiple optima fits the data better, suggesting a single optimum in SMR for Anura, Caudata and Gymnophiona. These results show how comparative phylogenetics could be applied for testing adaptive hypotheses regarding history and physiological performance in ectotherms.
Collapse
|
15
|
Manrique JM, Jones LR. Are ocean currents too slow to counteract SAR11 evolution? A next-generation sequencing, phylogeographic analysis. Mol Phylogenet Evol 2016; 107:324-337. [PMID: 27894996 DOI: 10.1016/j.ympev.2016.11.015] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 11/18/2016] [Accepted: 11/24/2016] [Indexed: 11/27/2022]
Abstract
This work set out to shed light on the phylogeography of the SAR11 clade of Alphaproteobacteria, which is probably the most abundant group of heterotrophic bacteria on Earth. In particular, we assessed the degree to which empirical evidence (environmental DNA sequences) supports the concept that SAR11 lineages evolve faster than they are dispersed thus generating vicariant distributions, as predicted by recent simulation efforts. We generated 16S rRNA gene sequences from surface seawater collected at the South West Atlantic Ocean and combined these data with previously published sequences from similar environments from elsewhere. Altogether, these data consisted in about 1e6 reads, from which we generated 355,306 high quality sequences of which 95,318 corresponded to SAR11. Quantitative phylogeographic analyses supported the existence of a spatially explicit distribution of SAR11 species and provided evidence in favor of the idea that dispersal limitations significantly contribute to SAR11 radiation throughout the world's oceans. Likewise, pairwise phylogenetic distances between the communities studied here were significantly correlated with the genetic divergences predicted by a previously proposed neutral model. As discussed in the paper, these findings are compatible with the concept that the ocean surface constitutes a homogeneous environment for SAR11, in agreement with previous experimental data. We discuss the implications of this hypothesis in a global change scenario. This is the first study combining high throughput sequencing and phylogenic analysis to study bacterial phylogeography and reporting a distance decay pattern of phylogenetic distances for bacteria.
Collapse
Affiliation(s)
- Julieta M Manrique
- Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, (C1083ACA) Buenos Aires, Argentina; Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, Argentina
| | - Leandro R Jones
- Consejo Nacional de Investigaciones Científicas y Técnicas, Av. Rivadavia 1917, (C1083ACA) Buenos Aires, Argentina; Laboratorio de Virología y Genética Molecular, Facultad de Ciencias Naturales sede Trelew, Universidad Nacional de la Patagonia San Juan Bosco, Argentina.
| |
Collapse
|
16
|
Mirande JM. Combined phylogeny of ray-finned fishes (Actinopterygii) and the use of morphological characters in large-scale analyses. Cladistics 2016; 33:333-350. [DOI: 10.1111/cla.12171] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/30/2016] [Indexed: 01/27/2023] Open
Affiliation(s)
- Juan Marcos Mirande
- Unidad Ejecutora Lillo (UEL, Fundación Miguel Lillo-CONICET); San Miguel de Tucumán 4000 Argentina
| |
Collapse
|
17
|
Coddington JA, Agnarsson I, Cheng RC, Čandek K, Driskell A, Frick H, Gregorič M, Kostanjšek R, Kropf C, Kweskin M, Lokovšek T, Pipan M, Vidergar N, Kuntner M. DNA barcode data accurately assign higher spider taxa. PeerJ 2016; 4:e2201. [PMID: 27547527 PMCID: PMC4958005 DOI: 10.7717/peerj.2201] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2016] [Accepted: 06/10/2016] [Indexed: 12/24/2022] Open
Abstract
The use of unique DNA sequences as a method for taxonomic identification is no longer fundamentally controversial, even though debate continues on the best markers, methods, and technology to use. Although both existing databanks such as GenBank and BOLD, as well as reference taxonomies, are imperfect, in best case scenarios "barcodes" (whether single or multiple, organelle or nuclear, loci) clearly are an increasingly fast and inexpensive method of identification, especially as compared to manual identification of unknowns by increasingly rare expert taxonomists. Because most species on Earth are undescribed, a complete reference database at the species level is impractical in the near term. The question therefore arises whether unidentified species can, using DNA barcodes, be accurately assigned to more inclusive groups such as genera and families-taxonomic ranks of putatively monophyletic groups for which the global inventory is more complete and stable. We used a carefully chosen test library of CO1 sequences from 49 families, 313 genera, and 816 species of spiders to assess the accuracy of genus and family-level assignment. We used BLAST queries of each sequence against the entire library and got the top ten hits. The percent sequence identity was reported from these hits (PIdent, range 75-100%). Accurate assignment of higher taxa (PIdent above which errors totaled less than 5%) occurred for genera at PIdent values >95 and families at PIdent values ≥ 91, suggesting these as heuristic thresholds for accurate generic and familial identifications in spiders. Accuracy of identification increases with numbers of species/genus and genera/family in the library; above five genera per family and fifteen species per genus all higher taxon assignments were correct. We propose that using percent sequence identity between conventional barcode sequences may be a feasible and reasonably accurate method to identify animals to family/genus. However, the quality of the underlying database impacts accuracy of results; many outliers in our dataset could be attributed to taxonomic and/or sequencing errors in BOLD and GenBank. It seems that an accurate and complete reference library of families and genera of life could provide accurate higher level taxonomic identifications cheaply and accessibly, within years rather than decades.
Collapse
Affiliation(s)
- Jonathan A. Coddington
- National Museum of Natural History, Smithsonian Institution, Washington, D.C., United States
| | - Ingi Agnarsson
- National Museum of Natural History, Smithsonian Institution, Washington, D.C., United States
- Department of Biology, University of Vermont, Burlington, Vermont, United States
| | - Ren-Chung Cheng
- EZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, Slovenia
| | - Klemen Čandek
- EZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, Slovenia
| | - Amy Driskell
- National Museum of Natural History, Smithsonian Institution, Washington, D.C., United States
| | - Holger Frick
- Department of Invertebrates, Natural History Museum Bern, Bern, Switzerland
| | - Matjaž Gregorič
- EZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, Slovenia
| | - Rok Kostanjšek
- Department of Biology, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Christian Kropf
- Department of Invertebrates, Natural History Museum Bern, Bern, Switzerland
| | - Matthew Kweskin
- National Museum of Natural History, Smithsonian Institution, Washington, D.C., United States
| | - Tjaša Lokovšek
- EZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, Slovenia
| | - Miha Pipan
- EZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, Slovenia
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Nina Vidergar
- EZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, Slovenia
| | - Matjaž Kuntner
- National Museum of Natural History, Smithsonian Institution, Washington, D.C., United States
- EZ Lab, Institute of Biology, Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana, Slovenia
| |
Collapse
|
18
|
Tsirogiannis C, Sandel B. Fast Computations for Measures of Phylogenetic Beta Diversity. PLoS One 2016; 11:e0151167. [PMID: 27054697 PMCID: PMC4824508 DOI: 10.1371/journal.pone.0151167] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 02/24/2016] [Indexed: 11/18/2022] Open
Abstract
For many applications in ecology, it is important to examine the phylogenetic relations between two communities of species. More formally, let [Formula: see text] be a phylogenetic tree and let A and B be two samples of its tips, representing the examined communities. We want to compute a value that expresses the phylogenetic diversity between A and B in [Formula: see text]. There exist several measures that can do this; these are the so-called phylogenetic beta diversity (β-diversity) measures. Two popular measures of this kind are the Community Distance (CD) and the Common Branch Length (CBL). In most applications, it is not sufficient to compute the value of a beta diversity measure for two communities A and B; we also want to know if this value is relatively large or small compared to all possible pairs of communities in [Formula: see text] that have the same size. To decide this, the ideal approach is to compute a standardised index that involves the mean and the standard deviation of this measure among all pairs of species samples that have the same number of elements as A and B. However, no method exists for computing exactly and efficiently this index for CD and CBL. We present analytical expressions for computing the expectation and the standard deviation of CD and CBL. Based on these expressions, we describe efficient algorithms for computing the standardised indices of the two measures. Using standard algorithmic analysis, we provide guarantees on the theoretical efficiency of our algorithms. We implemented our algorithms and measured their efficiency in practice. Our implementations compute the standardised indices of CD and CBL in less than twenty seconds for a hundred pairs of samples on trees with 7 ⋅ 10(4) tips. Our implementations are available through the R package PhyloMeasures.
Collapse
Affiliation(s)
| | - Brody Sandel
- MADALGO and Department of Bioscience, Aarhus University, Aarhus, Denmark
| |
Collapse
|
19
|
Goicoechea N, Frost DR, De la Riva I, Pellegrino KCM, Sites J, Rodrigues MT, Padial JM. Molecular systematics of teioid lizards (Teioidea/Gymnophthalmoidea: Squamata) based on the analysis of 48 loci under tree‐alignment and similarity‐alignment. Cladistics 2016; 32:624-671. [DOI: 10.1111/cla.12150] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/28/2015] [Indexed: 02/02/2023] Open
Affiliation(s)
- Noemí Goicoechea
- Department of Biodiversity and Evolutionary Biology Museo Nacional de Ciencias Naturales‐CSIC C/ José Gutiérrez Abascal 2 28006 Madrid Spain
| | - Darrel R. Frost
- Division of Vertebrate Zoology (Herpetology) American Museum of Natural History Central Park West at 79th Street New York NY 10024 USA
| | - Ignacio De la Riva
- Department of Biodiversity and Evolutionary Biology Museo Nacional de Ciencias Naturales‐CSIC C/ José Gutiérrez Abascal 2 28006 Madrid Spain
| | - Katia C. M. Pellegrino
- Departamento de Ciências Biológicas Universidade Federal de São Paulo Avenida Professor Artur Riedel 275 Diadema São Paulo CEP 09972‐270 Brazil
| | - Jack Sites
- Departament of Biology and M.L. Bean Life Science Museum Brigham Young University Provo UT 84602 USA
| | - Miguel T. Rodrigues
- Departamento de Zoologia Instituto de Biociências Universidade de São Paulo São Paulo CEP: 05508‐090 Brazil
| | - José M. Padial
- Section of Amphibians and Reptiles Carnegie Museum of Natural History 4400 Forbes Avenue Pittsburgh PA 15213 USA
| |
Collapse
|
20
|
Les DH. Water from the rock: Ancient aquatic angiosperms flow from the fossil record. Proc Natl Acad Sci U S A 2015; 112:10825-6. [PMID: 26290578 PMCID: PMC4568265 DOI: 10.1073/pnas.1514280112] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Donald H Les
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269
| |
Collapse
|
21
|
Gomez B, Daviero-Gomez V, Coiffard C, Martín-Closas C, Dilcher DL. Montsechia, an ancient aquatic angiosperm. Proc Natl Acad Sci U S A 2015; 112:10985-8. [PMID: 26283347 PMCID: PMC4568254 DOI: 10.1073/pnas.1509241112] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The early diversification of angiosperms in diverse ecological niches is poorly understood. Some have proposed an origin in a darkened forest habitat and others an open aquatic or near aquatic habitat. The research presented here centers on Montsechia vidalii, first recovered from lithographic limestone deposits in the Pyrenees of Spain more than 100 y ago. This fossil material has been poorly understood and misinterpreted in the past. Now, based upon the study of more than 1,000 carefully prepared specimens, a detailed analysis of Montsechia is presented. The morphology and anatomy of the plant, including aspects of its reproduction, suggest that Montsechia is sister to Ceratophyllum (whenever cladistic analyses are made with or without a backbone). Montsechia was an aquatic angiosperm living and reproducing below the surface of the water, similar to Ceratophyllum. Montsechia is Barremian in age, raising questions about the very early divergence of the Ceratophyllum clade compared with its position as sister to eudicots in many cladistic analyses. Lower Cretaceous aquatic angiosperms, such as Archaefructus and Montsechia, open the possibility that aquatic plants were locally common at a very early stage of angiosperm evolution and that aquatic habitats may have played a major role in the diversification of some early angiosperm lineages.
Collapse
Affiliation(s)
- Bernard Gomez
- CNRS-UMR 5276 Laboratoire de Géologie de Lyon-Terre, Planètes, Environnement, Université Lyon 1 (Claude Bernard), 69622 Villeurbanne, France;
| | - Véronique Daviero-Gomez
- CNRS-UMR 5276 Laboratoire de Géologie de Lyon-Terre, Planètes, Environnement, Université Lyon 1 (Claude Bernard), 69622 Villeurbanne, France
| | - Clément Coiffard
- Museum für Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, 10115 Berlin, Germany
| | - Carles Martín-Closas
- Departament d'Estratigrafia, Paleontologia i Geociències marines, Facultat de Geologia, Universitat de Barcelona, 08028 Barcelona, Catalonia, Spain
| | - David L Dilcher
- Department of Geological Sciences, Indiana University, Bloomington, IN 47405
| |
Collapse
|
22
|
Machado DJ. YBYRÁ facilitates comparison of large phylogenetic trees. BMC Bioinformatics 2015; 16:204. [PMID: 26130249 PMCID: PMC4488063 DOI: 10.1186/s12859-015-0642-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 06/06/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The number and size of tree topologies that are being compared by phylogenetic systematists is increasing due to technological advancements in high-throughput DNA sequencing. However, we still lack tools to facilitate comparison among phylogenetic trees with a large number of terminals. RESULTS The "YBYRÁ" project integrates software solutions for data analysis in phylogenetics. It comprises tools for (1) topological distance calculation based on the number of shared splits or clades, (2) sensitivity analysis and automatic generation of sensitivity plots and (3) clade diagnoses based on different categories of synapomorphies. YBYRÁ also provides (4) an original framework to facilitate the search for potential rogue taxa based on how much they affect average matching split distances (using MSdist). CONCLUSIONS YBYRÁ facilitates comparison of large phylogenetic trees and outperforms competing software in terms of usability and time efficiency, specially for large data sets. The programs that comprises this toolkit are written in Python, hence they do not require installation and have minimum dependencies. The entire project is available under an open-source licence at http://www.ib.usp.br/grant/anfibios/researchSoftware.html .
Collapse
Affiliation(s)
- Denis Jacob Machado
- Inter-institutional Grad Program on Bioinformatics, University of São Paulo, Rua do Matão, tv. 14, no. 101, sala 137, São Paulo, 05508-090, Brazil.
| |
Collapse
|
23
|
Goloboff PA, Szumik CA. Identifying unstable taxa: Efficient implementation of triplet-based measures of stability, and comparison with Phyutility and RogueNaRok. Mol Phylogenet Evol 2015; 88:93-104. [DOI: 10.1016/j.ympev.2015.04.003] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Revised: 04/03/2015] [Accepted: 04/05/2015] [Indexed: 10/23/2022]
|
24
|
Building the avian tree of life using a large-scale, sparse supermatrix. Mol Phylogenet Evol 2015; 84:53-63. [DOI: 10.1016/j.ympev.2014.12.003] [Citation(s) in RCA: 98] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Revised: 12/03/2014] [Accepted: 12/05/2014] [Indexed: 11/20/2022]
|
25
|
Dodsworth S, Chase MW, Kelly LJ, Leitch IJ, Macas J, Novák P, Piednoël M, Weiss-Schneeweiss H, Leitch AR. Genomic repeat abundances contain phylogenetic signal. Syst Biol 2015; 64:112-26. [PMID: 25261464 PMCID: PMC4265144 DOI: 10.1093/sysbio/syu080] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Accepted: 09/18/2014] [Indexed: 12/12/2022] Open
Abstract
A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution.
Collapse
Affiliation(s)
- Steven Dodsworth
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria
| | - Mark W Chase
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria
| | - Laura J Kelly
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria
| | - Ilia J Leitch
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria
| | - Jiří Macas
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria
| | - Petr Novák
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria
| | - Mathieu Piednoël
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria
| | - Hanna Weiss-Schneeweiss
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria
| | - Andrew R Leitch
- School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, UK; Jodrell Laboratory, Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3DS, UK; School of Plant Biology, The University of Western Australia, Crawley WA 6009, Australia; Institute of Plant Molecular Biology, Biology Centre ASCR, Branišovská 31, České Budějovice, CZ-37005, Czech Republic; Systematic Botany and Mycology, University of Munich (LMU), Menzinger Straße 67, 80638 München, Germany; and Department of Systematic and Evolutionary Botany, University of Vienna, Rennweg 14, A-1030 Vienna, Austria
| |
Collapse
|
26
|
Dubious resolution and support from published sparse supermatrices: The importance of thorough tree searches. Mol Phylogenet Evol 2014; 78:334-48. [DOI: 10.1016/j.ympev.2014.06.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 05/30/2014] [Accepted: 06/01/2014] [Indexed: 11/17/2022]
|
27
|
Chesters D, Zhu CD. A protocol for species delineation of public DNA databases, applied to the Insecta. Syst Biol 2014; 63:712-25. [PMID: 24929897 DOI: 10.1093/sysbio/syu038] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Public DNA databases are composed of data from many different taxa, although the taxonomic annotation on sequences is not always complete, which impedes the utilization of mined data for species-level applications. There is much ongoing work on species identification and delineation based on the molecular data itself, although applying species clustering to whole databases requires consolidation of results from numerous undefined gene regions, and introduces significant obstacles in data organization and computational load. In the current paper, we demonstrate an approach for species delineation of a sequence database. All DNA sequences for the insects were obtained and processed. After filtration of duplicated data, delineation of the database into species or molecular operational taxonomic units (MOTUs) followed a three-step process in which (i) the genetic loci L are partitioned, (ii) the species S are delineated within each locus, then (iii) species units are matched across loci to form the matrix L × S, a set of global (multilocus) species units. Partitioning the database into a set of homologous gene fragments was achieved by Markov clustering using edge weights calculated from the amount of overlap between pairs of sequences, then delineation of species units and assignment of species names were performed for the set of genes necessary to capture most of the species diversity. The complexity of computing pairwise similarities for species clustering was substantial at the cytochrome oxidase subunit I locus in particular, but made feasible through the development of software that performs pairwise alignments within the taxonomic framework, while accounting for the different ranks at which sequences are labeled with taxonomic information. Over 24 different homologs, the unidentified sequences numbered approximately 194,000, containing 41,525 species IDs (98.7% of all found in the insect database), and were grouped into 59,173 single-locus MOTUs by hierarchical clustering under parameters optimized independently for each locus. Species units from different loci were matched using a multipartite matching algorithm to form multilocus species units with minimal incongruence between loci. After matching, the insect database as represented by these 24 loci was found to be composed of 78,091 species units in total. 38,574 of these units contained only species labeled data, 34,891 contained only unlabeled data, leaving 4,626 units composed both of labeled and unlabeled sequences. In addition to giving estimates of species diversity of sequence repositories, the protocol developed here will facilitate species-level applications of modern-day sequence data sets. In particular, the L × S matrix represents a post-taxonomic framework that can be used for species-level organization of metagenomic data, and incorporation of these methods into phylogenetic pipelines will yield matrices more representative of species diversity.
Collapse
Affiliation(s)
- Douglas Chesters
- Key Laboratory of Zoological Systematics and Evolution (CAS), Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, PR China
| | - Chao-Dong Zhu
- Key Laboratory of Zoological Systematics and Evolution (CAS), Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, PR China
| |
Collapse
|
28
|
Phylogeny and evolution of RNA structure. Methods Mol Biol 2014. [PMID: 24639167 DOI: 10.1007/978-1-62703-709-9_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Darwin's conviction that all living beings on Earth are related and the graph of relatedness is tree-shaped has been essentially confirmed by phylogenetic reconstruction first from morphology and later from data obtained by molecular sequencing. Limitations of the phylogenetic tree concept were recognized as more and more sequence information became available. The other path-breaking idea of Darwin, natural selection of fitter variants in populations, is cast into simple mathematical form and extended to mutation-selection dynamics. In this form the theory is directly applicable to RNA evolution in vitro and to virus evolution. Phylogeny and population dynamics of RNA provide complementary insights into evolution and the interplay between the two concepts will be pursued throughout this chapter. The two strategies for understanding evolution are ultimately related through the central paradigm of structural biology: sequence ⇒ structure ⇒ function. We elaborate on the state of the art in modeling both phylogeny and evolution of RNA driven by reproduction and mutation. Thereby the focus will be laid on models for phylogenetic sequence evolution as well as evolution and design of RNA structures with selected examples and notes on simulation methods. In the perspectives an attempt is made to combine molecular structure, population dynamics, and phylogeny in modeling evolution.
Collapse
|
29
|
Allcock AL, Lindgren A, Strugnell J. The contribution of molecular data to our understanding of cephalopod evolution and systematics: a review. J NAT HIST 2014. [DOI: 10.1080/00222933.2013.825342] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
30
|
Wu J, Hasegawa M, Zhong Y, Yonezawa T. Importance of synonymous substitutions under dense taxon sampling and appropriate modeling in reconstructing the mitogenomic tree of Eutheria. Genes Genet Syst 2014; 89:237-51. [DOI: 10.1266/ggs.89.237] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Jiaqi Wu
- School of Life Sciences, Fudan University
| | - Masami Hasegawa
- The Institute of Statistical Mathematics
- School of Life Sciences, Fudan University
| | - Yang Zhong
- Institute of Biodiversity Science and Geobiology, Tibet University
- School of Life Sciences, Fudan University
| | | |
Collapse
|
31
|
An artifact caused by undersampling optimal trees in supermatrix analyses of locally sampled characters. Mol Phylogenet Evol 2013; 69:265-75. [DOI: 10.1016/j.ympev.2013.06.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Revised: 05/28/2013] [Accepted: 06/01/2013] [Indexed: 11/22/2022]
|
32
|
Goloboff PA. Oblong, a program to analyse phylogenomic data sets with millions of characters, requiring negligible amounts of RAM. Cladistics 2013; 30:273-281. [DOI: 10.1111/cla.12056] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/19/2013] [Indexed: 11/27/2022] Open
Affiliation(s)
- Pablo A. Goloboff
- Consejo Nacional de Investigaciones Científicas y Técnicas; Instituto Superior de Entomología; Miguel Lillo 205 S. M. de Tucumán 4000 Argentina
| |
Collapse
|
33
|
Affiliation(s)
- Pablo A. Goloboff
- CONICET; INSUE; Instituto Miguel Lillo; 4000 S.M. de Tucumán Argentina
| |
Collapse
|
34
|
Beaulieu JM, Donoghue MJ. Fruit evolution and diversification in campanulid angiosperms. Evolution 2013; 67:3132-44. [PMID: 24151998 DOI: 10.1111/evo.12180] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 05/30/2013] [Indexed: 11/27/2022]
Abstract
With increases in both the size and scope of phylogenetic trees, we are afforded a renewed opportunity to address long-standing comparative questions, such as whether particular fruit characters account for much of the variation in diversity among flowering plant clades. Studies to date have reported conflicting results, largely as a consequence of taxonomic scale and a reliance on potentially conservative statistical measures. Here we examine a larger and older angiosperm clade, the Campanulidae, and infer the rates of character transitions among the major fruit types, emphasizing the evolution of the achene fruits that are most frequently observed within the group. Our analyses imply that campanulids likely originated bearing capsules, and that all subsequent fruit diversity was derived from various modifications of this dry fruit type. We also found that the preponderance of lineages bearing achenes is a consequence of not only being a fruit type that is somewhat irreversible once it evolves, but one that also seems to have a positive association with diversification rates. Although these results imply the achene fruit type is a significant correlate of diversity patterns observed across campanulids, we conclude that it remains difficult to confidently and directly view this character state as the actual cause of increased diversification rates.
Collapse
Affiliation(s)
- Jeremy M Beaulieu
- Department of Ecology and Evolutionary Biology, Yale University, P.O. Box 208106, New Haven, Connecticut, 10620.
| | | |
Collapse
|
35
|
Gregor I, Steinbrück L, McHardy AC. PTree: pattern-based, stochastic search for maximum parsimony phylogenies. PeerJ 2013; 1:e89. [PMID: 23825794 PMCID: PMC3698465 DOI: 10.7717/peerj.89] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2012] [Accepted: 05/28/2013] [Indexed: 11/24/2022] Open
Abstract
Phylogenetic reconstruction is vital to analyzing the evolutionary relationship of genes within and across populations of different species. Nowadays, with next generation sequencing technologies producing sets comprising thousands of sequences, robust identification of the tree topology, which is optimal according to standard criteria such as maximum parsimony, maximum likelihood or posterior probability, with phylogenetic inference methods is a computationally very demanding task. Here, we describe a stochastic search method for a maximum parsimony tree, implemented in a software package we named PTree. Our method is based on a new pattern-based technique that enables us to infer intermediate sequences efficiently where the incorporation of these sequences in the current tree topology yields a phylogenetic tree with a lower cost. Evaluation across multiple datasets showed that our method is comparable to the algorithms implemented in PAUP* or TNT, which are widely used by the bioinformatics community, in terms of topological accuracy and runtime. We show that our method can process large-scale datasets of 1,000-8,000 sequences. We believe that our novel pattern-based method enriches the current set of tools and methods for phylogenetic tree inference. The software is available under: http://algbio.cs.uni-duesseldorf.de/webapps/wa-download/.
Collapse
Affiliation(s)
- Ivan Gregor
- Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, Saarbrücken, Germany
| | - Lars Steinbrück
- Department of Algorithmic Bioinformatics, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| | - Alice C. McHardy
- Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, Saarbrücken, Germany
- Department of Algorithmic Bioinformatics, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
| |
Collapse
|
36
|
Stoltzfus A, Lapp H, Matasci N, Deus H, Sidlauskas B, Zmasek CM, Vaidya G, Pontelli E, Cranston K, Vos R, Webb CO, Harmon LJ, Pirrung M, O'Meara B, Pennell MW, Mirarab S, Rosenberg MS, Balhoff JP, Bik HM, Heath TA, Midford PE, Brown JW, McTavish EJ, Sukumaran J, Westneat M, Alfaro ME, Steele A, Jordan G. Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient. BMC Bioinformatics 2013; 14:158. [PMID: 23668630 PMCID: PMC3669619 DOI: 10.1186/1471-2105-14-158] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Accepted: 04/30/2013] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user's needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. RESULTS With the aim of building such a "phylotastic" system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (http://www.phylotastic.org), and a server image. CONCLUSIONS Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.
Collapse
Affiliation(s)
- Arlin Stoltzfus
- Institute for Bioscience and Biotechnology Research (IBBR), Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Naim Matasci
- The iPlant Collaborative and EEB Department, University of Arizona, 1657 E Helen St, Tucson, AZ, 85721, USA
| | - Helena Deus
- Digital Enterprise Research Institute, National University of Ireland, University Road, Galway, Ireland
| | - Brian Sidlauskas
- Department of Fisheries and Wildlife, Oregon State University, 104 Nash Hall, Corvallis, OR, 97331-3803, USA
| | - Christian M Zmasek
- Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Gaurav Vaidya
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, 80309-0334, USA
| | - Enrico Pontelli
- Department of Computer Science, New Mexico State University, MSC CS, Box 30001, Las Cruces, NM, 88003, USA
| | - Karen Cranston
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Rutger Vos
- NCB Naturalis, Einsteinweg 2, Leiden, 2333 CC, the Netherlands
| | - Campbell O Webb
- Arnold Arboretum of Harvard University, Boston, MA, 02130, USA
| | - Luke J Harmon
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, PO Box 443051, Moscow, ID, 83844-3051, USA
| | - Megan Pirrung
- University of Colorado Denver Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Brian O'Meara
- Department of Ecology & Evolutionary Biology, 569 Dabney Hall, University of Tennessee, Knoxville, TN, 37996, USA
| | - Matthew W Pennell
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, PO Box 443051, Moscow, ID, 83844-3051, USA
| | - Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78701, USA
| | - Michael S Rosenberg
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, and School of Life Sciences, Arizona State University, PO Box 874501, Tempe, AZ, 85287-4501, USA
| | - James P Balhoff
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Holly M Bik
- UC Davis Genome Center, One Shields Ave, Davis, CA, 95618, USA
| | - Tracy A Heath
- Department of Integrative Biology, University of California, Berkeley, CA, 94720-3140, USA
| | - Peter E Midford
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Joseph W Brown
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, PO Box 443051, Moscow, ID, 83844-3051, USA
| | | | - Jeet Sukumaran
- Biology Department, Duke University, Biological Sciences Building, 125 Science Drive, Durham, NC, 27708, USA
| | - Mark Westneat
- Biodiversity Synthesis Center, Field Museum of Natural History, 1400 S Lakeshore Dr, Chicago, IL, 60605, USA
| | - Michael E Alfaro
- Department of Ecology and Evolutionary Biology, South University of California Los Angeles, 621 Charles E. Young Dr, Los Angeles, CA, 90095, USA
| | - Aaron Steele
- U.C. Berkeley Museum of Vertebrate Zoology, University of California, 3101 Valley Life Sciences Building, Berkeley, CA, 94720, USA
| | - Greg Jordan
- Paperpile, 34 Houghton Street, Somerville, MA, 02143, USA
| |
Collapse
|
37
|
Chesters D, Vogler AP. Resolving Ambiguity of Species Limits and Concatenation in Multilocus Sequence Data for the Construction of Phylogenetic Supermatrices. Syst Biol 2013; 62:456-66. [DOI: 10.1093/sysbio/syt011] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Douglas Chesters
- Department of Entomology, Natural History Museum, London SW7 5BD, UK; 2Division of Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire SL5 7PY, UK; and 3Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- Department of Entomology, Natural History Museum, London SW7 5BD, UK; 2Division of Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire SL5 7PY, UK; and 3Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- Department of Entomology, Natural History Museum, London SW7 5BD, UK; 2Division of Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire SL5 7PY, UK; and 3Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Alfried P. Vogler
- Department of Entomology, Natural History Museum, London SW7 5BD, UK; 2Division of Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire SL5 7PY, UK; and 3Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- Department of Entomology, Natural History Museum, London SW7 5BD, UK; 2Division of Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire SL5 7PY, UK; and 3Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
38
|
Janies DA, Studer J, Handelman SK, Linchangco G. A comparison of supermatrix and supertree methods for multilocus phylogenetics using organismal datasets. Cladistics 2013; 29:560-566. [DOI: 10.1111/cla.12014] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/31/2012] [Indexed: 11/28/2022] Open
Affiliation(s)
- Daniel A. Janies
- Department of Bioinformatics and Genomics; College of Computing and Informatics; University of North Carolina at Charlotte; 9201 University City Blvd; Charlotte; NC; 28223; USA
| | - Jonathon Studer
- Case Western Reserve University School of Law; 11075 East Boulevard; Cleveland; OH; 44106; USA
| | - Samuel K. Handelman
- Department of Pharmacology; College of Medicine; The Ohio State University; 333 W. 10th Ave.; Columbus; OH; 43210; USA
| | - Gregorio Linchangco
- Department of Bioinformatics and Genomics; College of Computing and Informatics; University of North Carolina at Charlotte; 9201 University City Blvd; Charlotte; NC; 28223; USA
| |
Collapse
|
39
|
Warnow T. Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
40
|
O'Meara BC. Evolutionary Inferences from Phylogenies: A Review of Methods. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2012. [DOI: 10.1146/annurev-ecolsys-110411-160331] [Citation(s) in RCA: 169] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Brian C. O'Meara
- Department of Ecology and Evolutionary Biology, University of Tennessee, Knoxville, Tennessee 37996; ,
| |
Collapse
|
41
|
Sheikh SI, Kahveci T, Ranka S, Gordon Burleigh J. Stability analysis of phylogenetic trees. Bioinformatics 2012; 29:166-74. [PMID: 23162082 DOI: 10.1093/bioinformatics/bts657] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Phylogenetics, or reconstructing the evolutionary relationships of organisms, is critical for understanding evolution. A large number of heuristic algorithms for phylogenetics have been developed, some of which enable estimates of trees with tens of thousands of taxa. Such trees may not be robust, as small changes in the input data can cause major differences in the optimal topology. Tools that can assess the quality and stability of phylogenetic tree estimates and identify the most reliable parts of the tree are needed. RESULTS We define measures that assess the stability of trees, subtrees and individual taxa with respect to changes in the input sequences. Our measures consider changes at the finest granularity in the input data (i.e. individual nucleotides). We demonstrate the effectiveness of our measures on large published datasets. Our measures are computationally feasible for phylogenetic datasets consisting of tens of thousands of taxa. AVAILABILITY This software is available at http://bioinformatics.cise.ufl.edu/phylostab CONTACT sheikh@cise.ufl.edu
Collapse
Affiliation(s)
- Saad I Sheikh
- Department of Computer and Information Science and Engineering, University of Florida, FL 32611, USA.
| | | | | | | |
Collapse
|
42
|
Dugas-Ford J, Rowell JJ, Ragsdale CW. Cell-type homologies and the origins of the neocortex. Proc Natl Acad Sci U S A 2012; 109:16974-9. [PMID: 23027930 PMCID: PMC3479531 DOI: 10.1073/pnas.1204773109] [Citation(s) in RCA: 183] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The six-layered neocortex is a uniquely mammalian structure with evolutionary origins that remain in dispute. One long-standing hypothesis, based on similarities in neuronal connectivity, proposes that homologs of the layer 4 input and layer 5 output neurons of neocortex are present in the avian forebrain, where they contribute to specific nuclei rather than to layers. We devised a molecular test of this hypothesis based on layer-specific gene expression that is shared across rodent and carnivore neocortex. Our findings establish that the layer 4 input and the layer 5 output cell types are conserved across the amniotes, but are organized into very different architectures, forming nuclei in birds, cortical areas in reptiles, and cortical layers in mammals.
Collapse
Affiliation(s)
- Jennifer Dugas-Ford
- Department of Neurobiology and Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL 60637
| | - Joanna J. Rowell
- Department of Neurobiology and Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL 60637
| | - Clifton W. Ragsdale
- Department of Neurobiology and Department of Organismal Biology and Anatomy, University of Chicago, Chicago, IL 60637
| |
Collapse
|
43
|
Ramu A, Kahveci T, Burleigh JG. A scalable method for identifying frequent subtrees in sets of large phylogenetic trees. BMC Bioinformatics 2012; 13:256. [PMID: 23033843 PMCID: PMC3543182 DOI: 10.1186/1471-2105-13-256] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2012] [Accepted: 09/05/2012] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. RESULTS We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods. CONCLUSIONS Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.
Collapse
Affiliation(s)
- Avinash Ramu
- Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA
| | - Tamer Kahveci
- Computer and Information Science and Engineering, University of Florida, Gainesville, FL, USA
| | | |
Collapse
|
44
|
Smith SA, O'Meara BC. treePL: divergence time estimation using penalized likelihood for large phylogenies. ACTA ACUST UNITED AC 2012; 28:2689-90. [PMID: 22908216 DOI: 10.1093/bioinformatics/bts492] [Citation(s) in RCA: 422] [Impact Index Per Article: 35.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
UNLABELLED Ever larger phylogenies are being constructed due to the explosion of genetic data and development of high-performance phylogenetic reconstruction algorithms. However, most methods for calculating divergence times are limited to datasets that are orders of magnitude smaller than recently published large phylogenies. Here, we present an algorithm and implementation of a divergence time method using penalized likelihood that can handle datasets of thousands of taxa. We implement a method that combines the standard derivative-based optimization with a stochastic simulated annealing approach to overcome optimization challenges. We compare this approach with existing software including r8s, PATHd8 and BEAST. AVAILABILITY Source code, example files, binaries and documentation for treePL are available at https://github.com/blackrim/treePL.
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA.
| | | |
Collapse
|
45
|
|
46
|
Goloboff PA, Catalano SA. GB-to-TNT: facilitating creation of matrices from GenBank and diagnosis of results in TNT. Cladistics 2012; 28:503-513. [DOI: 10.1111/j.1096-0031.2012.00400.x] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
47
|
Dimitrov D, Lopardo L, Giribet G, Arnedo MA, Alvarez-Padilla F, Hormiga G. Tangled in a sparse spider web: single origin of orb weavers and their spinning work unravelled by denser taxonomic sampling. Proc Biol Sci 2012; 279:1341-50. [PMID: 22048955 PMCID: PMC3282380 DOI: 10.1098/rspb.2011.2011] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2011] [Accepted: 10/11/2011] [Indexed: 11/12/2022] Open
Abstract
In order to study the tempo and the mode of spider orb web evolution and diversification, we conducted a phylogenetic analysis using six genetic markers along with a comprehensive taxon sample. The present analyses are the first to recover the monophyly of orb-weaving spiders based solely on DNA sequence data and an extensive taxon sample. We present the first dated orb weaver phylogeny. Our results suggest that orb weavers appeared by the Middle Triassic and underwent a rapid diversification during the end of the Triassic and Early Jurassic. By the second half of the Jurassic, most of the extant orb-weaving families and web designs were already present. The processes that may have given origin to this diversification of lineages and web architectures are discussed. A combination of biotic factors, such as key innovations in web design and silk composition, as well as abiotic environmental changes, may have played important roles in the diversification of orb weavers. Our analyses also show that increased taxon sampling density in both ingroups and outgroups greatly improves phylogenetic accuracy even when extensive data are missing. This effect is particularly important when addition of character data improves gene overlap.
Collapse
Affiliation(s)
- Dimitar Dimitrov
- Center for Macroecology, Evolution and Climate, Zoological Museum, University of Copenhagen, Copenhagen, Denmark.
| | | | | | | | | | | |
Collapse
|
48
|
Parr CS, Guralnick R, Cellinese N, Page RD. Evolutionary informatics: unifying knowledge about the diversity of life. Trends Ecol Evol 2012; 27:94-103. [PMID: 22154516 DOI: 10.1016/j.tree.2011.11.001] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2011] [Revised: 10/31/2011] [Accepted: 11/01/2011] [Indexed: 01/23/2023]
|
49
|
Page RDM. Space, time, form: viewing the Tree of Life. Trends Ecol Evol 2011; 27:113-20. [PMID: 22209094 DOI: 10.1016/j.tree.2011.12.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Revised: 12/05/2011] [Accepted: 12/05/2011] [Indexed: 02/06/2023]
Abstract
There are numerous ways to display a phylogenetic tree, which is reflected in the diversity of software tools available to phylogenetists. Displaying very large trees continues to be a challenge, made ever harder as increasing computing power enables researchers to construct ever-larger trees. At the same time, computing technology is enabling novel visualisations, ranging from geophylogenies embedded on digital globes to touch-screen interfaces that enable greater interaction with evolutionary trees. In this review, I survey recent developments in phylogenetic visualisation, highlighting successful (and less successful) approaches and sketching some future directions.
Collapse
Affiliation(s)
- Roderic D M Page
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, G12 8QQ, UK.
| |
Collapse
|
50
|
Izquierdo-Carrasco F, Smith SA, Stamatakis A. Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees. BMC Bioinformatics 2011; 12:470. [PMID: 22165866 PMCID: PMC3267785 DOI: 10.1186/1471-2105-12-470] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2011] [Accepted: 12/13/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The rapid accumulation of molecular sequence data, driven by novel wet-lab sequencing technologies, poses new challenges for large-scale maximum likelihood-based phylogenetic analyses on trees with more than 30,000 taxa and several genes. The three main computational challenges are: numerical stability, the scalability of search algorithms, and the high memory requirements for computing the likelihood. RESULTS We introduce methods for solving these three key problems and provide respective proof-of-concept implementations in RAxML. The mechanisms presented here are not RAxML-specific and can thus be applied to any likelihood-based (Bayesian or maximum likelihood) tree inference program. We develop a new search strategy that can reduce the time required for tree inferences by more than 50% while yielding equally good trees (in the statistical sense) for well-chosen starting trees. We present an adaptation of the Subtree Equality Vector technique for phylogenomic datasets with missing data (already available in RAxML v728) that can reduce execution times and memory requirements by up to 50%. Finally, we discuss issues pertaining to the numerical stability of the Γ model of rate heterogeneity on very large trees and argue in favor of rate heterogeneity models that use a single rate or rate category for each site to resolve these problems. CONCLUSIONS We address three major issues pertaining to large scale tree reconstruction under maximum likelihood and propose respective solutions. Respective proof-of-concept/production-level implementations of our ideas are made available as open-source code.
Collapse
Affiliation(s)
- Fernando Izquierdo-Carrasco
- The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Germany
| | - Stephen A Smith
- The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Germany
- 2 Smith Lab, Dept. Ecology and Evolutionary Biology, University of Michigan, 2005 Kraus Natural Science Building, Ann Arbor, MI 48109-1048 USA
| | - Alexandros Stamatakis
- The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Germany
| |
Collapse
|