1
|
Chesters D, Bossert S, Orr MC. [genus]_[species]; Presenting phylogenies to facilitate synthesis. Cladistics 2025; 41:177-192. [PMID: 39673226 DOI: 10.1111/cla.12601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2024] [Revised: 11/23/2024] [Accepted: 11/25/2024] [Indexed: 12/16/2024] Open
Abstract
Each published phylogeny is a potential contribution to the synthesis of the Tree of Life and countless downstream projects. Steps are needed for fully synthesizable science, but only a minority of studies achieve these. We here review the range of phylogenetic presentation and note aspects that hinder further analysis. We provide simple suggestions on publication that would greatly enhance utilizability, and propose a formal grammar for phylogeny terminal format. We suggest that each published phylogeny should be accompanied by at minimum the single preferred result in machine readable tree (e.g. Newick) form in the supplement, a simple task fulfilled by fewer than half of studies. Further, the tree should be clear from the file name and extension; the orientation (rooted or unrooted) should match the figures; terminals labels should include genus and species IDs; underscores should separate strings within-field (instead of white spaces); and if other informational fields are added these should be separated by a unique delimiting character (we suggest multiple underscores or the vertical pipe character, |) and ordered consistently. These requirements are largely independent of phylogenetic study aims, while we note other requirements for synthesis (e.g. removal of species repeats and uninformative terminals) that are not necessarily the responsibility of authors. Machine readable trees show greater variation in terminal formatting than typical phylogeny images (owing presumably to greater scrutiny of the latter), and thus are complex and laborious to parse. Since the majority of existing studies have provided only images, we additionally review typical variation in plotting style, information that will be necessary for developing the automated phylogeny transcription tools needed for their eventual inclusion in the Tree of Life.
Collapse
Affiliation(s)
- Douglas Chesters
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, 100101, China
- International College, University of Chinese Academy of Sciences, Shijingshan District, Beijing, 100049, China
| | - Silas Bossert
- Department of Entomology, Washington State University, 1945 Ferdinand's Ln, Pullman, WA, 99163, USA
| | - Michael C Orr
- Entomologie, Staatliches Museum für Naturkunde Stuttgart, Rosenstein 1, Stuttgart, 70191, Germany
| |
Collapse
|
2
|
Shen XX, Li Y, Hittinger CT, Chen XX, Rokas A. An investigation of irreproducibility in maximum likelihood phylogenetic inference. Nat Commun 2020; 11:6096. [PMID: 33257660 PMCID: PMC7705714 DOI: 10.1038/s41467-020-20005-6] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 11/05/2020] [Indexed: 01/09/2023] Open
Abstract
Phylogenetic trees are essential for studying biology, but their reproducibility under identical parameter settings remains unexplored. Here, we find that 3515 (18.11%) IQ-TREE-inferred and 1813 (9.34%) RAxML-NG-inferred maximum likelihood (ML) gene trees are topologically irreproducible when executing two replicates (Run1 and Run2) for each of 19,414 gene alignments in 15 animal, plant, and fungal phylogenomic datasets. Notably, coalescent-based ASTRAL species phylogenies inferred from Run1 and Run2 sets of individual gene trees are topologically irreproducible for 9/15 phylogenomic datasets, whereas concatenation-based phylogenies inferred twice from the same supermatrix are reproducible. Our simulations further show that irreproducible phylogenies are more likely to be incorrect than reproducible phylogenies. These results suggest that a considerable fraction of single-gene ML trees may be irreproducible. Increasing reproducibility in ML inference will benefit from providing analyses’ log files, which contain typically reported parameters (e.g., program, substitution model, number of tree searches) but also typically unreported ones (e.g., random starting seed number, number of threads, processor type). Replicate runs of maximum likelihood phylogenetic analyses can generate different tree topologies due to differences in parameters, such as random seeds. Here, Shen et al. demonstrate that replicate runs can generate substantially different tree topologies even with identical data and parameters.
Collapse
Affiliation(s)
- Xing-Xing Shen
- State Key Laboratory of Rice Biology, Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, 310058, Hangzhou, China. .,Institute of Insect Sciences, Zhejiang University, 310058, Hangzhou, China.
| | - Yuanning Li
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, 37235, USA
| | - Chris Todd Hittinger
- Laboratory of Genetics, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, Center for Genomic Science Innovation, University of Wisconsin-Madison, Madison, WI, 53706, USA.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Xue-Xin Chen
- State Key Laboratory of Rice Biology, Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, 310058, Hangzhou, China.,Institute of Insect Sciences, Zhejiang University, 310058, Hangzhou, China
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, 37235, USA.
| |
Collapse
|
3
|
Nguyen VD, Nguyen TH, Tayeen ASM, Laughinghouse HD, Sánchez-Reyes LL, Wiggins J, Pontelli E, Mozzherin D, O’Meara B, Stoltzfus A. Phylotastic: Improving Access to Tree-of-Life Knowledge With Flexible, on-the-Fly Delivery of Trees. Evol Bioinform Online 2020; 16:1176934319899384. [PMID: 32372858 PMCID: PMC7192527 DOI: 10.1177/1176934319899384] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 11/20/2019] [Indexed: 11/15/2022] Open
Abstract
A comprehensive phylogeny of species, i.e., a tree of life, has potential uses in a variety of contexts, including research, education, and public policy. Yet, accessing the tree of life typically requires special knowledge, complex software, or long periods of training. The Phylotastic project aims make it as easy to get a phylogeny of species as it is to get driving directions from mapping software. In prior work, we presented a design for an open system to validate and manage taxon names, find phylogeny resources, extract subtrees matching a user's taxon list, scale trees to time, and integrate related resources such as species images. Here, we report the implementation of a set of tools that together represent a robust, accessible system for on-the-fly delivery of phylogenetic knowledge. This set of tools includes a web portal to execute several customizable workflows to obtain species phylogenies (scaled by geologic time and decorated with thumbnail images); more than 30 underlying web services (accessible via a common registry); and code toolkits in R and Python (allowing others to develop custom applications using Phylotastic services). The Phylotastic system, accessible via http://www.phylotastic.org, provides a unique resource to access the current state of phylogenetic knowledge, useful for a variety of cases in which a tree extracted quickly from online resources (as distinct from a tree custom-made from character data) is sufficient, as it is for many casual uses of trees identified here.
Collapse
Affiliation(s)
- Van D Nguyen
- Department of Computer Science, New Mexico State University, Las Cruces, NM, USA
| | - Thanh H Nguyen
- Department of Computer Science, New Mexico State University, Las Cruces, NM, USA
| | - Abu Saleh Md Tayeen
- Department of Computer Science, New Mexico State University, Las Cruces, NM, USA
| | - H Dail Laughinghouse
- Institute for Bioscience and Biotechnology Research, Rockville, MD, USA
- Fort Lauderdale Research and Education Center, University of Florida/IFAS, Davie, FL, USA
| | - Luna L Sánchez-Reyes
- Department of Ecology and Evolutionary Biology, The University of Tennessee, Knoxville, Knoxville, TN, USA
| | - Jodie Wiggins
- Department of Ecology and Evolutionary Biology, The University of Tennessee, Knoxville, Knoxville, TN, USA
| | - Enrico Pontelli
- Department of Computer Science, New Mexico State University, Las Cruces, NM, USA
| | - Dmitry Mozzherin
- Illinois Natural History Survey, Species File Group, University of Illinois at Urbana–Champaign, Champaign, IL, USA
| | - Brian O’Meara
- Department of Ecology and Evolutionary Biology, The University of Tennessee, Knoxville, Knoxville, TN, USA
| | - Arlin Stoltzfus
- Institute for Bioscience and Biotechnology Research, Rockville, MD, USA
- Office of Data and Informatics, Material Measurement Laboratory, NIST, Gaithersburg, MD, USA
| |
Collapse
|
4
|
Baker E, Vincent S. A deafening silence: a lack of data and reproducibility in published bioacoustics research? Biodivers Data J 2019; 7:e36783. [PMID: 31723333 PMCID: PMC6834726 DOI: 10.3897/bdj.7.e36783] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 10/29/2019] [Indexed: 11/12/2022] Open
Abstract
A study of 100 papers from five journals that make use of bioacoustic recordings shows that only a minority (21%) deposit any of the recordings in a repository, supplementary materials section or a personal website. This lack of deposition hinders re-use of the raw data by other researchers, prevents the reproduction of a project's analyses and confirmation of its findings and impedes progress within the broader bioacoustics community. We make some recommendations for researchers interested in depositing their data.
Collapse
Affiliation(s)
- Ed Baker
- Natural History Museum, London, United KingdomNatural History MuseumLondonUnited Kingdom
- University of York, York, United KingdomUniversity of YorkYorkUnited Kingdom
| | - Sarah Vincent
- Natural History Museum, London, United KingdomNatural History MuseumLondonUnited Kingdom
| |
Collapse
|
5
|
Stöver BC, Wiechers S, Müller KF. JPhyloIO: a Java library for event-based reading and writing of different phylogenetic file formats through a common interface. BMC Bioinformatics 2019; 20:402. [PMID: 31331268 PMCID: PMC6647125 DOI: 10.1186/s12859-019-2982-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 07/02/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Today a variety of phylogenetic file formats exists, some of which are well-established but limited in their data model, while other more recently introduced ones offer advanced features for metadata representation. Although most currently available software only supports the classical formats with a limited metadata model, it would be desirable to have support for the more advanced formats. This is necessary for users to produce richly annotated data that can be efficiently reused and make underlying workflows easily reproducible. A programming library that abstracts over the data and metadata models of the different formats and allows supporting all of them in one step would significantly simplify the development of new and the extension of existing software to address the need for better metadata annotation. RESULTS We developed the Java library JPhyloIO, which allows event-based reading and writing of the most common alignment and tree/network formats. It allows full access to all features of the nine currently supported formats. By implementing a single JPhyloIO-based reader and writer, application developers can support all of these formats. Due to the event-based architecture, JPhyloIO can be combined with any application data structure, and is memory efficient for large datasets. JPhyloIO is distributed under LGPL. Detailed documentation and example applications (available on http://bioinfweb.info/JPhyloIO/ ) significantly lower the entry barrier for bioinformaticians who wish to benefit from JPhyloIO's features in their own software. CONCLUSION JPhyloIO enables simplified development of new and extension of existing applications that support various standard formats simultaneously. This has the potential to improve interoperability between phylogenetic software tools and at the same time motivate usage of more recent metadata-rich formats such as NeXML or phyloXML.
Collapse
Affiliation(s)
- Ben C Stöver
- Institute for Evolution and Biodiversity, WWU Münster, Hüfferstraße 1, 48149, Münster, Germany.
| | - Sarah Wiechers
- Institute for Evolution and Biodiversity, WWU Münster, Hüfferstraße 1, 48149, Münster, Germany
| | - Kai F Müller
- Institute for Evolution and Biodiversity, WWU Münster, Hüfferstraße 1, 48149, Münster, Germany
| |
Collapse
|
6
|
Chang J, Rabosky DL, Smith SA, Alfaro ME. An
r
package and online resource for macroevolutionary studies using the ray‐finned fish tree of life. Methods Ecol Evol 2019. [DOI: 10.1111/2041-210x.13182] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Jonathan Chang
- School of Biological Sciences Monash University Clayton VIC Australia
| | - Daniel L. Rabosky
- Museum of Zoology Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor MI
| | - Stephen A. Smith
- Museum of Zoology Department of Ecology and Evolutionary Biology University of Michigan Ann Arbor MI
| | - Michael E. Alfaro
- Department of Ecology and Evolutionary BiologyUniversity of CaliforniaLos AngelesCA
| |
Collapse
|
7
|
Bravo GA, Antonelli A, Bacon CD, Bartoszek K, Blom MPK, Huynh S, Jones G, Knowles LL, Lamichhaney S, Marcussen T, Morlon H, Nakhleh LK, Oxelman B, Pfeil B, Schliep A, Wahlberg N, Werneck FP, Wiedenhoeft J, Willows-Munro S, Edwards SV. Embracing heterogeneity: coalescing the Tree of Life and the future of phylogenomics. PeerJ 2019; 7:e6399. [PMID: 30783571 PMCID: PMC6378093 DOI: 10.7717/peerj.6399] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 01/07/2019] [Indexed: 12/23/2022] Open
Abstract
Building the Tree of Life (ToL) is a major challenge of modern biology, requiring advances in cyberinfrastructure, data collection, theory, and more. Here, we argue that phylogenomics stands to benefit by embracing the many heterogeneous genomic signals emerging from the first decade of large-scale phylogenetic analysis spawned by high-throughput sequencing (HTS). Such signals include those most commonly encountered in phylogenomic datasets, such as incomplete lineage sorting, but also those reticulate processes emerging with greater frequency, such as recombination and introgression. Here we focus specifically on how phylogenetic methods can accommodate the heterogeneity incurred by such population genetic processes; we do not discuss phylogenetic methods that ignore such processes, such as concatenation or supermatrix approaches or supertrees. We suggest that methods of data acquisition and the types of markers used in phylogenomics will remain restricted until a posteriori methods of marker choice are made possible with routine whole-genome sequencing of taxa of interest. We discuss limitations and potential extensions of a model supporting innovation in phylogenomics today, the multispecies coalescent model (MSC). Macroevolutionary models that use phylogenies, such as character mapping, often ignore the heterogeneity on which building phylogenies increasingly rely and suggest that assimilating such heterogeneity is an important goal moving forward. Finally, we argue that an integrative cyberinfrastructure linking all steps of the process of building the ToL, from specimen acquisition in the field to publication and tracking of phylogenomic data, as well as a culture that values contributors at each step, are essential for progress.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Alexandre Antonelli
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
- Gothenburg Botanical Garden, Göteborg, Sweden
| | - Christine D. Bacon
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Krzysztof Bartoszek
- Department of Computer and Information Science, Linköping University, Linköping, Sweden
| | - Mozes P. K. Blom
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
| | - Stella Huynh
- Institut de Biologie, Université de Neuchâtel, Neuchâtel, Switzerland
| | - Graham Jones
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - L. Lacey Knowles
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Sangeet Lamichhaney
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Thomas Marcussen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway
| | - Hélène Morlon
- Institut de Biologie, Ecole Normale Supérieure de Paris, Paris, France
| | - Luay K. Nakhleh
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bengt Oxelman
- Gothenburg Global Biodiversity Centre, Göteborg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Bernard Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Göteborg, Sweden
| | - Alexander Schliep
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| | | | - Fernanda P. Werneck
- Coordenação de Biodiversidade, Programa de Coleções Científicas Biológicas, Instituto Nacional de Pesquisa da Amazônia, Manaus, AM, Brazil
| | - John Wiedenhoeft
- Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
- Department of Computer Science, Rutgers University, Piscataway, NJ, USA
| | - Sandi Willows-Munro
- School of Life Sciences, University of Kwazulu-Natal, Pietermaritzburg, South Africa
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
- Gothenburg Centre for Advanced Studies in Science and Technology, Chalmers University of Technology and University of Gothenburg, Göteborg, Sweden
| |
Collapse
|
8
|
Eiserhardt WL, Antonelli A, Bennett DJ, Botigué LR, Burleigh JG, Dodsworth S, Enquist BJ, Forest F, Kim JT, Kozlov AM, Leitch IJ, Maitner BS, Mirarab S, Piel WH, Pérez-Escobar OA, Pokorny L, Rahbek C, Sandel B, Smith SA, Stamatakis A, Vos RA, Warnow T, Baker WJ. A roadmap for global synthesis of the plant tree of life. AMERICAN JOURNAL OF BOTANY 2018; 105:614-622. [PMID: 29603138 DOI: 10.1002/ajb2.1041] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 11/08/2017] [Indexed: 06/08/2023]
Abstract
Providing science and society with an integrated, up-to-date, high quality, open, reproducible and sustainable plant tree of life would be a huge service that is now coming within reach. However, synthesizing the growing body of DNA sequence data in the public domain and disseminating the trees to a diverse audience are often not straightforward due to numerous informatics barriers. While big synthetic plant phylogenies are being built, they remain static and become quickly outdated as new data are published and tree-building methods improve. Moreover, the body of existing phylogenetic evidence is hard to navigate and access for non-experts. We propose that our community of botanists, tree builders, and informaticians should converge on a modular framework for data integration and phylogenetic analysis, allowing easy collaboration, updating, data sourcing and flexible analyses. With support from major institutions, this pipeline should be re-run at regular intervals, storing trees and their metadata long-term. Providing the trees to a diverse global audience through user-friendly front ends and application development interfaces should also be a priority. Interactive interfaces could be used to solicit user feedback and thus improve data quality and to coordinate the generation of new data. We conclude by outlining a number of steps that we suggest the scientific community should take to achieve global phylogenetic synthesis.
Collapse
Affiliation(s)
- Wolf L Eiserhardt
- Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
- Department of Bioscience, Aarhus University, Ny Munkegade 116, 8000, Aarhus C, Denmark
| | - Alexandre Antonelli
- Gothenburg Global Biodiversity Centre, Box 461, 405 30, Gothenburg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden
- Gothenburg Botanical Garden, Carl Skottsbergs Gata 22B, SE-413 19, Gothenburg, Sweden
| | - Dominic J Bennett
- Gothenburg Global Biodiversity Centre, Box 461, 405 30, Gothenburg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden
- Gothenburg Botanical Garden, Carl Skottsbergs Gata 22B, SE-413 19, Gothenburg, Sweden
| | | | | | | | - Brian J Enquist
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
- The Santa Fe Institute, Santa Fe, NM, 87501, USA
| | - Félix Forest
- Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
| | - Jan T Kim
- Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
| | - Alexey M Kozlov
- Scientific Computing Group, Heidelberg Institute for Theoretical Studies, 69118, Heidelberg, Germany
| | - Ilia J Leitch
- Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
| | - Brian S Maitner
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA, 92093, USA
| | - William H Piel
- Yale-NUS College, 16 College Avenue West, Singapore, 138527, Republic of Singapore
| | | | - Lisa Pokorny
- Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
| | - Carsten Rahbek
- Center for Macroecology, Evolution and Climate, University of Copenhagen, Universitetsparken 15, DK-2100, Copenhagen O, Denmark
- Imperial College London, Silwood Park, Buckhurst Road, Ascot, Berkshire, SL5 7PY, UK
| | - Brody Sandel
- Department of Biology, Santa Clara University, Santa Clara, CA, 95053, USA
| | - Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Alexandros Stamatakis
- Scientific Computing Group, Heidelberg Institute for Theoretical Studies, 69118, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, 76128, Karlsruhe, Germany
| | - Rutger A Vos
- Naturalis Biodiversity Center, P.O. Box 9517, 2300RA, Leiden, The Netherlands
- Institute of Biology Leiden, P.O. Box 9505, 2300RA, Leiden, The Netherlands
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | | |
Collapse
|
9
|
The development of scientific consensus: Analyzing conflict and concordance among avian phylogenies. Mol Phylogenet Evol 2017; 116:69-77. [DOI: 10.1016/j.ympev.2017.08.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 08/03/2017] [Accepted: 08/06/2017] [Indexed: 11/22/2022]
|
10
|
McTavish EJ, Drew BT, Redelings B, Cranston KA. How and Why to Build a Unified Tree of Life. Bioessays 2017; 39. [PMID: 28980328 DOI: 10.1002/bies.201700114] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 08/27/2017] [Indexed: 01/20/2023]
Abstract
Phylogenetic trees are a crucial backbone for a wide breadth of biological research spanning systematics, organismal biology, ecology, and medicine. In 2015, the Open Tree of Life project published a first draft of a comprehensive tree of life, summarizing digitally available taxonomic and phylogenetic knowledge. This paper reviews, investigates, and addresses the following questions as a follow-up to that paper, from the perspective of researchers involved in building this summary of the tree of life: Is there a tree of life and should we reconstruct it? Is available data sufficient to reconstruct the tree of life? Do we have access to phylogenetic inferences in usable form? Can we combine different phylogenetic estimates across the tree of life? And finally, what is the future of understanding the tree of life?
Collapse
Affiliation(s)
| | - Bryan T Drew
- University of Nebraska at Kearney, Kerney, NE, 68849, USA
| | - Ben Redelings
- University of Kansas, Lawrence, KS, 66045, USA Duke University, Durham NC 27705 USA; Ronin Institute, Durham, NC 27705 USA
| | | |
Collapse
|
11
|
Mounce R, Murray-Rust P, Wills M. A machine-compiled microbial supertree from figure-mining thousands of papers. RESEARCH IDEAS AND OUTCOMES 2017. [DOI: 10.3897/rio.3.e13589] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
|
12
|
Penev L, Mietchen D, Chavan V, Hagedorn G, Smith V, Shotton D, Ó Tuama É, Senderov V, Georgiev T, Stoev P, Groom Q, Remsen D, Edmunds S. Strategies and guidelines for scholarly publishing of biodiversity data. RESEARCH IDEAS AND OUTCOMES 2017. [DOI: 10.3897/rio.3.e12431] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
|
13
|
Michonneau F, Brown JW, Winter DJ. rotl: an R package to interact with the Open Tree of Life data. Methods Ecol Evol 2016. [DOI: 10.1111/2041-210x.12593] [Citation(s) in RCA: 202] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- François Michonneau
- Whitney Laboratory for Marine Sciences University of Florida St. Augustine FL 32080 USA
- Florida Museum of Natural History University of Florida Gainesville FL 32611‐7800 USA
| | - Joseph W. Brown
- Department of Ecology & Evolutionary Biology University of Michigan Ann Arbor MI 48109 USA
| | - David J. Winter
- Virginia G. Piper Centre for Personalized Diagnostics The Biodesign Institute, Arizona State University Tempe AZ 85287‐5001 USA
| |
Collapse
|
14
|
Dececchi TA, Balhoff JP, Lapp H, Mabee PM. Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies. Syst Biol 2015; 64:936-52. [PMID: 26018570 PMCID: PMC4604830 DOI: 10.1093/sysbio/syv031] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 05/20/2015] [Indexed: 02/02/2023] Open
Abstract
The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in noncomputer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; kb.phenoscape.org), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived by machine reasoning over the formal semantics of the source ontologies. Inferred data reduced the missing data in the variable character-subset from 98.5% to 78.2%. Machine reasoning also enables the isolation of conflicts in the data, that is, cells where both presence and absence are indicated; reports regarding conflicting data provenance can be generated automatically. Further, reasoning enables quantification and new visualizations of the data, here for example, allowing identification of character space that has been undersampled across the fin-to-limb transition. The approach and methods demonstrated here to compute synthetic presence/absence supermatrices are applicable to any taxonomic and phenotypic slice across the tree of life, providing the data are semantically annotated. Because such data can also be linked to model organism genetics through computational scoring of phenotypic similarity, they open a rich set of future research questions into phenotype-to-genome relationships.
Collapse
Affiliation(s)
| | - James P Balhoff
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; University of North Carolina, Chapel Hill, NC 27599, USA
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Center for Genomics and Computational Biology, Duke University, Durham, NC 27708, USA
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD 57069, USA;
| |
Collapse
|
15
|
Boettiger C, Chamberlain S, Vos R, Lapp H. RNeXML: a package for reading and writing richly annotated phylogenetic, character and trait data in
r. Methods Ecol Evol 2015. [DOI: 10.1111/2041-210x.12469] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Carl Boettiger
- Department of Environmental Science Policy and Management University of California Berkeley CA USA
| | | | - Rutger Vos
- Naturalis Biodiversity Center Leiden The Netherlands
| | - Hilmar Lapp
- Center for Genomic and Computational Biology National Evolutionary Synthesis Center Duke University Durham NCUSA
| |
Collapse
|
16
|
Hinchliff CE, Smith SA, Allman JF, Burleigh JG, Chaudhary R, Coghill LM, Crandall KA, Deng J, Drew BT, Gazis R, Gude K, Hibbett DS, Katz LA, Laughinghouse HD, McTavish EJ, Midford PE, Owen CL, Ree RH, Rees JA, Soltis DE, Williams T, Cranston KA. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proc Natl Acad Sci U S A 2015; 112:12764-9. [PMID: 26385966 PMCID: PMC4611642 DOI: 10.1073/pnas.1423041112] [Citation(s) in RCA: 391] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips-the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.
Collapse
Affiliation(s)
- Cody E Hinchliff
- Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109
| | - Stephen A Smith
- Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109;
| | | | | | - Ruchi Chaudhary
- Department of Biology, University of Florida, Gainesville, FL 32611
| | | | - Keith A Crandall
- Computational Biology Institute, George Washington University, Ashburn, VA 20147
| | - Jiabin Deng
- Department of Biology, University of Florida, Gainesville, FL 32611
| | - Bryan T Drew
- Department of Biology, University of Nebraska-Kearney, Kearney, NE 68849
| | - Romina Gazis
- Department of Biology, Clark University, Worcester, MA 01610
| | - Karl Gude
- School of Journalism, Michigan State University, East Lansing, MI 48824
| | - David S Hibbett
- Department of Biology, Clark University, Worcester, MA 01610
| | - Laura A Katz
- Biological Science, Clark Science Center, Smith College, Northampton, MA 01063
| | | | - Emily Jane McTavish
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045
| | | | | | | | - Jonathan A Rees
- National Evolutionary Synthesis Center, Duke University, Durham, NC 27705
| | - Douglas E Soltis
- Department of Biology, University of Florida, Gainesville, FL 32611; Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
| | - Tiffani Williams
- Computer Science and Engineering, Texas A&M University, College Station, TX 77843
| | - Karen A Cranston
- National Evolutionary Synthesis Center, Duke University, Durham, NC 27705;
| |
Collapse
|
17
|
Pope LC, Liggins L, Keyse J, Carvalho SB, Riginos C. Not the time or the place: the missing spatio-temporal link in publicly available genetic data. Mol Ecol 2015; 24:3802-9. [DOI: 10.1111/mec.13254] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2015] [Revised: 05/07/2015] [Accepted: 05/22/2015] [Indexed: 11/29/2022]
Affiliation(s)
- Lisa C. Pope
- School of Biological Sciences; The University of Queensland; Brisbane Qld 4072 Australia
| | - Libby Liggins
- Allan Wilson Centre for Molecular Ecology and Evolution; New Zealand Institute for Advanced Study; Institute of Natural and Mathematical Sciences; Massey University; Auckland 0745 New Zealand
- Auckland War Memorial Museum; Tāmaki Paenga Hira; Auckland 1142 New Zealand
| | - Jude Keyse
- School of Biological Sciences; The University of Queensland; Brisbane Qld 4072 Australia
| | - Silvia B Carvalho
- CIBIO/InBIO - Centro de Investigação em Biodiversidade e Recursos Genéticos da Universidade do Porto; R. Padre Armando Quintas 4485-661 Vairão Portugal
| | - Cynthia Riginos
- School of Biological Sciences; The University of Queensland; Brisbane Qld 4072 Australia
| |
Collapse
|
18
|
McTavish EJ, Hinchliff CE, Allman JF, Brown JW, Cranston KA, Holder MT, Rees JA, Smith SA. Phylesystem: a git-based data store for community-curated phylogenetic estimates. Bioinformatics 2015; 31:2794-800. [PMID: 25940563 PMCID: PMC4547614 DOI: 10.1093/bioinformatics/btv276] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 04/27/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al., 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. RESULTS Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git's version control features. Hosting this data store on GitHub (http://github.com/) provides open access to the data store using tools familiar to many developers. We have deployed a server running the 'phylesystem-api', which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. AVAILABILITY AND IMPLEMENTATION Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api. The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem. A web application that uses the phylesystem web services is deployed at http://tree.opentreeoflife.org/curator. Code for that tool is available from https://github.com/OpenTreeOfLife/opentree. CONTACT mtholder@gmail.com.
Collapse
Affiliation(s)
- Emily Jane McTavish
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Cody E Hinchliff
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | | | - Joseph W Brown
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Karen A Cranston
- National Evolutionary Synthesis Center, Duke University, Durham, NC, USA
| | - Mark T Holder
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS, USA, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Jonathan A Rees
- National Evolutionary Synthesis Center, Duke University, Durham, NC, USA
| | - Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
19
|
|
20
|
Magee AF, May MR, Moore BR. The dawn of open access to phylogenetic data. PLoS One 2014; 9:e110268. [PMID: 25343725 PMCID: PMC4208793 DOI: 10.1371/journal.pone.0110268] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 09/09/2014] [Indexed: 01/29/2023] Open
Abstract
The scientific enterprise depends critically on the preservation of and open access to published data. This basic tenet applies acutely to phylogenies (estimates of evolutionary relationships among species). Increasingly, phylogenies are estimated from increasingly large, genome-scale datasets using increasingly complex statistical methods that require increasing levels of expertise and computational investment. Moreover, the resulting phylogenetic data provide an explicit historical perspective that critically informs research in a vast and growing number of scientific disciplines. One such use is the study of changes in rates of lineage diversification (speciation--extinction) through time. As part of a meta-analysis in this area, we sought to collect phylogenetic data (comprising nucleotide sequence alignment and tree files) from 217 studies published in 46 journals over a 13-year period. We document our attempts to procure those data (from online archives and by direct request to corresponding authors), and report results of analyses (using Bayesian logistic regression) to assess the impact of various factors on the success of our efforts. Overall, complete phylogenetic data for [Formula: see text] of these studies are effectively lost to science. Our study indicates that phylogenetic data are more likely to be deposited in online archives and/or shared upon request when: (1) the publishing journal has a strong data-sharing policy; (2) the publishing journal has a higher impact factor, and; (3) the data are requested from faculty rather than students. Importantly, our survey spans recent policy initiatives and infrastructural changes; our analyses indicate that the positive impact of these community initiatives has been both dramatic and immediate. Although the results of our study indicate that the situation is dire, our findings also reveal tremendous recent progress in the sharing and preservation of phylogenetic data.
Collapse
Affiliation(s)
- Andrew F. Magee
- Department of Evolution and Ecology, University of California Davis, Davis, CA, United States of America
| | - Michael R. May
- Department of Evolution and Ecology, University of California Davis, Davis, CA, United States of America
| | - Brian R. Moore
- Department of Evolution and Ecology, University of California Davis, Davis, CA, United States of America
| |
Collapse
|
21
|
Cranston K, Harmon LJ, O'Leary MA, Lisle C. Best practices for data sharing in phylogenetic research. PLOS CURRENTS 2014; 6:ecurrents.tol.bf01eff4a6b60ca4825c69293dc59645. [PMID: 24987572 PMCID: PMC4073804 DOI: 10.1371/currents.tol.bf01eff4a6b60ca4825c69293dc59645] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
As phylogenetic data becomes increasingly available, along with associated data on species' genomes, traits, and geographic distributions, the need to ensure data availability and reuse become more and more acute. In this paper, we provide ten "simple rules" that we view as best practices for data sharing in phylogenetic research. These rules will help lead towards a future phylogenetics where data can easily be archived, shared, reused, and repurposed across a wide variety of projects.
Collapse
Affiliation(s)
- Karen Cranston
- National Evolutionary Synthesis Center, Duke University, Durham, North Carolina, USA
| | - Luke J Harmon
- Department of Biological Sciences, University of Idaho, Moscow, Idaho, USA
| | - Maureen A O'Leary
- Department of Anatomical Sciences, Stony Brook University, Stonybrook, New York, USA
| | | |
Collapse
|
22
|
Kenall A, Harold S, Foote C. An open future for ecological and evolutionary data? BMC Evol Biol 2014; 14:66. [PMID: 24690275 PMCID: PMC3992160 DOI: 10.1186/1471-2148-14-66] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Accepted: 03/25/2014] [Indexed: 11/19/2022] Open
Abstract
As part of BioMed Central's open science mission, we are pleased to announce that two of our journals have integrated with the open data repository Dryad. Authors submitting their research to either BMC Ecology or BMC Evolutionary Biology will now have the opportunity to deposit their data directly into the Dryad archive and will receive a permanent, citable link to their dataset. Although this does not affect any of our current data deposition policies at these journals, we hope to encourage a more widespread adoption of open data sharing in the fields of ecology and evolutionary biology by facilitating this process for our authors. We also take this opportunity to discuss some of the wider issues that may concern researchers when making their data openly available. Although we offer a number of positive examples from different fields of biology, we also recognise that reticence to data sharing still exists, and that change must be driven from within research communities in order to create future science that is fit for purpose in the digital age. This editorial was published jointly in both BMC Ecology and BMC Evolutionary Biology.
Collapse
Affiliation(s)
- Amye Kenall
- BioMed Central, Floor 6, 236 Gray’s Inn Road, London WC1X 8HB, UK
| | - Simon Harold
- BioMed Central, Floor 6, 236 Gray’s Inn Road, London WC1X 8HB, UK
| | | |
Collapse
|
23
|
Kenall A, Harold S, Foote C. An open future for ecological and evolutionary data? BMC Ecol 2014; 14:10. [PMID: 24690219 PMCID: PMC3992165 DOI: 10.1186/1472-6785-14-10] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Accepted: 03/25/2014] [Indexed: 01/14/2023] Open
Abstract
As part of BioMed Central's open science mission, we are pleased to announce that two of our journals have integrated with the open data repository Dryad. Authors submitting their research to either BMC Ecology or BMC Evolutionary Biology will now have the opportunity to deposit their data directly into the Dryad archive and will receive a permanent, citable link to their dataset. Although this does not affect any of our current data deposition policies at these journals, we hope to encourage a more widespread adoption of open data sharing in the fields of ecology and evolutionary biology by facilitating this process for our authors. We also take this opportunity to discuss some of the wider issues that may concern researchers when making their data openly available. Although we offer a number of positive examples from different fields of biology, we also recognise that reticence to data sharing still exists, and that change must be driven from within research communities in order to create future science that is fit for purpose in the digital age. This editorial was published jointly in both BMC Ecology and BMC Evolutionary Biology.
Collapse
Affiliation(s)
- Amye Kenall
- BioMed Central, Floor 6, 236 Gray’s Inn Road, London WC1X 8HB, UK
| | - Simon Harold
- BioMed Central, Floor 6, 236 Gray’s Inn Road, London WC1X 8HB, UK
| | | |
Collapse
|
24
|
Walls RL, Deck J, Guralnick R, Baskauf S, Beaman R, Blum S, Bowers S, Buttigieg PL, Davies N, Endresen D, Gandolfo MA, Hanner R, Janning A, Krishtalka L, Matsunaga A, Midford P, Morrison N, Tuama ÉÓ, Schildhauer M, Smith B, Stucky BJ, Thomer A, Wieczorek J, Whitacre J, Wooley J. Semantics in support of biodiversity knowledge discovery: an introduction to the biological collections ontology and related ontologies. PLoS One 2014; 9:e89606. [PMID: 24595056 PMCID: PMC3940615 DOI: 10.1371/journal.pone.0089606] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Accepted: 01/24/2014] [Indexed: 11/19/2022] Open
Abstract
The study of biodiversity spans many disciplines and includes data pertaining to species distributions and abundances, genetic sequences, trait measurements, and ecological niches, complemented by information on collection and measurement protocols. A review of the current landscape of metadata standards and ontologies in biodiversity science suggests that existing standards such as the Darwin Core terminology are inadequate for describing biodiversity data in a semantically meaningful and computationally useful way. Existing ontologies, such as the Gene Ontology and others in the Open Biological and Biomedical Ontologies (OBO) Foundry library, provide a semantic structure but lack many of the necessary terms to describe biodiversity data in all its dimensions. In this paper, we describe the motivation for and ongoing development of a new Biological Collections Ontology, the Environment Ontology, and the Population and Community Ontology. These ontologies share the aim of improving data aggregation and integration across the biodiversity domain and can be used to describe physical samples and sampling processes (for example, collection, extraction, and preservation techniques), as well as biodiversity observations that involve no physical sampling. Together they encompass studies of: 1) individual organisms, including voucher specimens from ecological studies and museum specimens, 2) bulk or environmental samples (e.g., gut contents, soil, water) that include DNA, other molecules, and potentially many organisms, especially microbes, and 3) survey-based ecological observations. We discuss how these ontologies can be applied to biodiversity use cases that span genetic, organismal, and ecosystem levels of organization. We argue that if adopted as a standard and rigorously applied and enriched by the biodiversity community, these ontologies would significantly reduce barriers to data discovery, integration, and exchange among biodiversity resources and researchers.
Collapse
Affiliation(s)
- Ramona L. Walls
- The iPlant Collaborative, University of Arizona, Tucson, Arizona, United States of America
- * E-mail:
| | - John Deck
- University of California, Berkeley, Berkeley, California, United States of America
| | - Robert Guralnick
- Department of Ecology and Evolutionary Biology and the CU Museum of Natural History, University of Colorado at Boulder, Boulder, Colorado, United States of America
| | - Steve Baskauf
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Reed Beaman
- University of Florida, Florida Museum of Natural History, Gainesville, Florida, United States of America
| | - Stanley Blum
- Research Informatics, California Academy of Sciences, San Francisco, California, United States of America
| | - Shawn Bowers
- Gonzaga University, Computer Science, Spokane, Washington, United States of America
| | - Pier Luigi Buttigieg
- Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Bremerhaven, Germany
| | - Neil Davies
- University of California, Berkeley, Gump South Pacific Research Station, Moorea, French Polynesia
| | - Dag Endresen
- GBIF Norway, Natural History Museum, University in Oslo, Oslo, Norway
| | - Maria Alejandra Gandolfo
- LH Bailey Hortorium, Department of Plant Biology, Cornell University, Ithaca, New York, United States of America
| | - Robert Hanner
- Biodiversity Institute of Ontario, University of Guelph, Guelph, ON, Canada
| | - Alyssa Janning
- School of Information Resources and Library Science, University of Arizona, Tucson, Arizona, United States of America
| | - Leonard Krishtalka
- Biodiversity Institute and Ecology & Evolutionary Biology, The University of Kansas, Lawrence, Kansas, United States of America
| | - Andréa Matsunaga
- University of Florida, Gainesville, Florida, United States of America
| | - Peter Midford
- Ecology and Evolutionary Biology, University of Kansas, Lawrence, Kansas, United States of America
| | - Norman Morrison
- The BioVeL Project, School of Computer Science, The University of Manchester, Manchester, United Kingdom
| | | | - Mark Schildhauer
- National Center for Ecological Analysis and Synthesis, Santa Barbara, California, United States of America
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, New York, United States of America
| | - Brian J. Stucky
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, Colorado, United States of America
| | - Andrea Thomer
- Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, United States of America
| | - John Wieczorek
- 3101 VLSB, Museum of Vertebrate Zoology, University of California, Berkeley, Berkeley, California, United States of America
| | - Jamie Whitacre
- Informatics Branch, Information Technology Office, National Museum of Natural History, Smithsonian Institution, Washington, DC, United States of America
| | - John Wooley
- University of California San Diego, La Jolla, California, United States of America
| |
Collapse
|
25
|
Southan C, Hancock JM. A tale of two drug targets: the evolutionary history of BACE1 and BACE2. Front Genet 2013; 4:293. [PMID: 24381583 PMCID: PMC3865767 DOI: 10.3389/fgene.2013.00293] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Accepted: 11/29/2013] [Indexed: 11/22/2022] Open
Abstract
The beta amyloid (APP) cleaving enzyme (BACE1) has been a drug target for Alzheimer's Disease (AD) since 1999 with lead inhibitors now entering clinical trials. In 2011, the paralog, BACE2, became a new target for type II diabetes (T2DM) having been identified as a TMEM27 secretase regulating pancreatic β cell function. However, the normal roles of both enzymes are unclear. This study outlines their evolutionary history and new opportunities for functional genomics. We identified 30 homologs (UrBACEs) in basal phyla including Placozoans, Cnidarians, Choanoflagellates, Porifera, Echinoderms, Annelids, Mollusks and Ascidians (but not Ecdysozoans). UrBACEs are predominantly single copy, show 35-45% protein sequence identity with mammalian BACE1, are ~100 residues longer than cathepsin paralogs with an aspartyl protease domain flanked by a signal peptide and a C-terminal transmembrane domain. While multiple paralogs in Trichoplax and Monosiga pre-date the nervous system, duplication of the UrBACE in fish gave rise to BACE1 and BACE2 in the vertebrate lineage. The latter evolved more rapidly as the former maintained the emergent neuronal role. In mammals, Ka/Ks for BACE2 is higher than BACE1 but low ratios for both suggest purifying selection. The 5' exons show higher Ka/Ks than the catalytic section. Model organism genomes show the absence of certain BACE human substrates when the UrBACE is present. Experiments could thus reveal undiscovered substrates and roles. The human protease double-target status means that evolutionary trajectories and functional shifts associated with different substrates will have implications for the development of clinical candidates for both AD and T2DM. A rational basis for inhibition specificity ratios and assessing target-related side effects will be facilitated by a more complete picture of BACE1 and BACE2 functions informed by their evolutionary context.
Collapse
Affiliation(s)
- Christopher Southan
- IUPHAR Database and Guide to Pharmacology Web Portal Group, University/BHF Centre for Cardiovascular Science, Queen's Medical Research Institute, University of EdinburghEdinburgh, UK
| | - John M. Hancock
- Department of Physiology, Development and Neuroscience, University of CambridgeCambridge, UK
| |
Collapse
|
26
|
Panahiazar M, Sheth AP, Ranabahu A, Vos RA, Leebens-Mack J. Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach. BMC Med Genomics 2013; 6 Suppl 3:S5. [PMID: 24565381 PMCID: PMC3980757 DOI: 10.1186/1755-8794-6-s3-s5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena, including, for example, the importance of gene and genome duplications in the evolution of gene function, the role of adaptation as a driver of diversification, or the evolutionary consequences of biogeographic shifts. Phyloinformaticists are developing data standards, databases and communication protocols (e.g. Application Programming Interfaces, APIs) to extend the accessibility of gene trees, species trees, and the metadata necessary to interpret these trees, thus enabling researchers across the life sciences to reuse phylogenetic knowledge. Specifically, Semantic Web technologies are being developed to make phylogenetic knowledge interpretable by web agents, thereby enabling intelligently automated, high-throughput reuse of results generated by phylogenetic research. This manuscript describes an ontology-driven, semantic problem-solving environment for phylogenetic analyses and introduces artefacts that can promote phyloinformatic efforts to promote accessibility of trees and underlying metadata. PhylOnt is an extensible ontology with concepts describing tree types and tree building methodologies including estimation methods, models and programs. In addition we present the PhylAnt platform for annotating scientific articles and NeXML files with PhylOnt concepts. The novelty of this work is the annotation of NeXML files and phylogenetic related documents with PhylOnt Ontology. This approach advances data reuse in phyloinformatics.
Collapse
|
27
|
Drew BT, Gazis R, Cabezas P, Swithers KS, Deng J, Rodriguez R, Katz LA, Crandall KA, Hibbett DS, Soltis DE. Lost branches on the tree of life. PLoS Biol 2013; 11:e1001636. [PMID: 24019756 PMCID: PMC3760775 DOI: 10.1371/journal.pbio.1001636] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Failure to archive published data can impede reproducibility and inhibit downstream synthesis. Alarmingly, we estimate that ∼70% of existing DNA sequence alignments/phylogenetic trees, representing much of the underpinning of modern phylogenetic analysis, are no longer accessible. The evolutionary biology community needs to adopt policies ensuring that data are publicly archived upon publication.
Collapse
Affiliation(s)
- Bryan T. Drew
- University of Florida, Gainesville, Florida, United States of America
- * E-mail:
| | - Romina Gazis
- Clark University, Worcester, Massachusetts, United States of America
| | - Patricia Cabezas
- Brigham Young University, Provo, Utah, United States of America
- George Washington University, Washington, DC, United States of America
| | | | - Jiabin Deng
- University of Florida, Gainesville, Florida, United States of America
| | - Roseana Rodriguez
- University of Florida, Gainesville, Florida, United States of America
| | - Laura A. Katz
- Smith College, Northampton, Massachusetts, United States of America
| | - Keith A. Crandall
- George Washington University, Washington, DC, United States of America
| | - David S. Hibbett
- Clark University, Worcester, Massachusetts, United States of America
| | - Douglas E. Soltis
- University of Florida, Gainesville, Florida, United States of America
- Florida Museum of Natural History, Gainesville, Florida, United States of America
| |
Collapse
|
28
|
Matasci N, McKay S. Phylogenetic analysis with the iPlant discovery environment. CURRENT PROTOCOLS IN BIOINFORMATICS 2013; Chapter 6:6.13.1-6.13.13. [PMID: 23749754 DOI: 10.1002/0471250953.bi0613s42] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The iPlant Collaborative's Discovery Environment is a unified Web portal to many bioinformatics applications and analytical workflows, including various methods of phylogenetic analysis. This unit describes example protocols for phylogenetic analyses, starting at sequence retrieval from the GenBank sequence database, through to multiple sequence alignment inference and visualization of phylogenetic trees. Methods for extracting smaller sub-trees from very large phylogenies, and the comparative method of continuous ancestral character state reconstruction based on observed morphology of extant species related to their phylogenetic relationships, are also presented.
Collapse
Affiliation(s)
- Naim Matasci
- The iPlant Collaborative.,The University of Arizona, Tucson, Arizona
| | - Sheldon McKay
- The iPlant Collaborative.,Cold Spring Harbor Laboratory, Cold Spring Harbor, New York
| |
Collapse
|
29
|
Stoltzfus A, Lapp H, Matasci N, Deus H, Sidlauskas B, Zmasek CM, Vaidya G, Pontelli E, Cranston K, Vos R, Webb CO, Harmon LJ, Pirrung M, O'Meara B, Pennell MW, Mirarab S, Rosenberg MS, Balhoff JP, Bik HM, Heath TA, Midford PE, Brown JW, McTavish EJ, Sukumaran J, Westneat M, Alfaro ME, Steele A, Jordan G. Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient. BMC Bioinformatics 2013; 14:158. [PMID: 23668630 PMCID: PMC3669619 DOI: 10.1186/1471-2105-14-158] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Accepted: 04/30/2013] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user's needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. RESULTS With the aim of building such a "phylotastic" system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (http://www.phylotastic.org), and a server image. CONCLUSIONS Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.
Collapse
Affiliation(s)
- Arlin Stoltzfus
- Institute for Bioscience and Biotechnology Research (IBBR), Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Naim Matasci
- The iPlant Collaborative and EEB Department, University of Arizona, 1657 E Helen St, Tucson, AZ, 85721, USA
| | - Helena Deus
- Digital Enterprise Research Institute, National University of Ireland, University Road, Galway, Ireland
| | - Brian Sidlauskas
- Department of Fisheries and Wildlife, Oregon State University, 104 Nash Hall, Corvallis, OR, 97331-3803, USA
| | - Christian M Zmasek
- Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Gaurav Vaidya
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, 80309-0334, USA
| | - Enrico Pontelli
- Department of Computer Science, New Mexico State University, MSC CS, Box 30001, Las Cruces, NM, 88003, USA
| | - Karen Cranston
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Rutger Vos
- NCB Naturalis, Einsteinweg 2, Leiden, 2333 CC, the Netherlands
| | - Campbell O Webb
- Arnold Arboretum of Harvard University, Boston, MA, 02130, USA
| | - Luke J Harmon
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, PO Box 443051, Moscow, ID, 83844-3051, USA
| | - Megan Pirrung
- University of Colorado Denver Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Brian O'Meara
- Department of Ecology & Evolutionary Biology, 569 Dabney Hall, University of Tennessee, Knoxville, TN, 37996, USA
| | - Matthew W Pennell
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, PO Box 443051, Moscow, ID, 83844-3051, USA
| | - Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78701, USA
| | - Michael S Rosenberg
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, and School of Life Sciences, Arizona State University, PO Box 874501, Tempe, AZ, 85287-4501, USA
| | - James P Balhoff
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Holly M Bik
- UC Davis Genome Center, One Shields Ave, Davis, CA, 95618, USA
| | - Tracy A Heath
- Department of Integrative Biology, University of California, Berkeley, CA, 94720-3140, USA
| | - Peter E Midford
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Joseph W Brown
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, PO Box 443051, Moscow, ID, 83844-3051, USA
| | | | - Jeet Sukumaran
- Biology Department, Duke University, Biological Sciences Building, 125 Science Drive, Durham, NC, 27708, USA
| | - Mark Westneat
- Biodiversity Synthesis Center, Field Museum of Natural History, 1400 S Lakeshore Dr, Chicago, IL, 60605, USA
| | - Michael E Alfaro
- Department of Ecology and Evolutionary Biology, South University of California Los Angeles, 621 Charles E. Young Dr, Los Angeles, CA, 90095, USA
| | - Aaron Steele
- U.C. Berkeley Museum of Vertebrate Zoology, University of California, 3101 Valley Life Sciences Building, Berkeley, CA, 94720, USA
| | - Greg Jordan
- Paperpile, 34 Houghton Street, Somerville, MA, 02143, USA
| |
Collapse
|