1
|
Nilsson RH, Andersson AF, Bissett A, Finstad AG, Fossøy F, Grosjean M, Hope M, Jeppesen TS, Kõljalg U, Lundin D, Prager M, Suominen S, Svenningsen CS, Schigel D. Introducing guidelines for publishing DNA-derived occurrence data through biodiversity data platforms. METABARCODING AND METAGENOMICS 2022. [DOI: 10.3897/mbmg.6.84960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
DNA sequencing efforts of environmental and other biological samples disclose unprecedented and largely untapped opportunities for advances in the taxonomy, ecology, and geographical distributions of our living world. To realise this potential, DNA-derived occurrence data (notably sequences with dates and coordinates) – much like traditional specimens and observations – need to be discoverable and interpretable through biodiversity data platforms. The Global Biodiversity Information Facility (GBIF) recently headed a community effort to assemble a set of guidelines for publishing DNA-derived data. These guidelines target the principles and approaches of exposing DNA-derived occurrence data in the context of broader biodiversity data. They cover a choice of terms using a controlled vocabulary, common pitfalls, and good practices, without going into platform-specific details. Our hope is that they will benefit anyone interested in better exposure of DNA-derived occurrence data through general biodiversity data platforms, including national biodiversity portals. This paper provides a brief rationale and an overview of the guidelines, an up-to-date version of which is maintained at https://doi.org/10.35035/doc-vf1a-nr22. User feedback and interaction are encouraged as new techniques and best practices emerge.
Collapse
|
2
|
Cernava T, Rybakova D, Buscot F, Clavel T, McHardy AC, Meyer F, Meyer F, Overmann J, Stecher B, Sessitsch A, Schloter M, Berg G. Metadata harmonization-Standards are the key for a better usage of omics data for integrative microbiome analysis. ENVIRONMENTAL MICROBIOME 2022; 17:33. [PMID: 35751093 PMCID: PMC9233336 DOI: 10.1186/s40793-022-00425-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 05/29/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND Tremendous amounts of data generated from microbiome research studies during the last decades require not only standards for sampling and preparation of omics data but also clear concepts of how the metadata is prepared to ensure re-use for integrative and interdisciplinary microbiome analysis. RESULTS In this Commentary, we present our views on the key issues related to the current system for metadata submission in omics research, and propose the development of a global metadata system. Such a system should be easy to use, clearly structured in a hierarchical way, and should be compatible with all existing microbiome data repositories, following common standards for minimal required information and common ontology. Although minimum metadata requirements are essential for microbiome datasets, the immense technological progress requires a flexible system, which will have to be constantly improved and re-thought. While FAIR principles (Findable, Accessible, Interoperable, and Reusable) are already considered, international legal issues on genetic resource and sequence sharing provided by the Convention on Biological Diversity need more awareness and engagement of the scientific community. CONCLUSIONS The suggested approach for metadata entries would strongly improve retrieving and re-using data as demonstrated in several representative use cases. These integrative analyses, in turn, would further advance the potential of microbiome research for novel scientific discoveries and the development of microbiome-derived products.
Collapse
Affiliation(s)
- Tomislav Cernava
- Institute of Environmental Biotechnology, Graz University of Technology, Graz, Austria
| | - Daria Rybakova
- Institute of Environmental Biotechnology, Graz University of Technology, Graz, Austria
| | - François Buscot
- 2Soil Ecology Department, Helmholtz Centre for Environmental Research (UFZ), Halle (Saale), Germany
- 3German Centre for Integrative Biodiversity Research (iDiv) Halle–Jena–Leipzig, Leipzig, Germany
| | - Thomas Clavel
- Functional Microbiome Research Group, Institute of Medical Microbiology, RWTH University Hospital, Aachen, Germany
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- German Center for Infection Research (DZIF), Hannover-Braunschweig site, Hannover, Germany
- Cluster of Excellence RESIST (EXC2155), Hannover Medical School, Hannover, Germany
| | - Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | | | - Jörg Overmann
- Leibniz Institute DSMZ German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
- Technical University of Braunschweig, Braunschweig, Germany
| | - Bärbel Stecher
- Max Von Pettenkofer Institute of Hygiene and Medical Microbiology, Faculty of Medicine, LMU Munich, Munich, Germany
- German Center for Infection Research (DZIF), Munich, Germany
| | - Angela Sessitsch
- Bioresources Unit, AIT Austrian Institute of Technology, Tulln, Austria
| | | | - Gabriele Berg
- Institute of Environmental Biotechnology, Graz University of Technology, Graz, Austria
- Leibniz-Institute for Agricultural Engineering Potsdam (ATB), Potsdam, Germany
- University of Potsdam, Potsdam, Germany
| |
Collapse
|
3
|
Durkin L, Jansson T, Sanchez M, Khomich M, Ryberg M, Kristiansson E, Nilsson RH. When mycologists describe new species, not all relevant information is provided (clearly enough). MycoKeys 2020; 72:109-128. [PMID: 32982558 PMCID: PMC7498475 DOI: 10.3897/mycokeys.72.56691] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 08/24/2020] [Indexed: 01/17/2023] Open
Abstract
Taxonomic mycology struggles with what seems to be a perpetual shortage of resources. Logically, fungal taxonomists should therefore leverage every opportunity to highlight and visualize the importance of taxonomic work, the usefulness of taxonomic data far beyond taxonomy, and the integrative and collaborative nature of modern taxonomy at large. Is mycology really doing that, though? In this study, we went through ten years' worth (2009-2018) of species descriptions of extant fungal taxa - 1,097 studies describing at most ten new species - in five major mycological journals plus one plant journal. We estimated the frequency at which a range of key words, illustrations, and concepts related to ecology, geography, taxonomy, molecular data, and data availability were provided with the descriptions. We also considered a range of science-demographical aspects such as gender bias and the rejuvenation of taxonomy and taxonomists as well as public availability of the results. Our results show that the target audience of fungal species descriptions appears to be other fungal taxonomists, because many aspects of the new species were presented only implicitly, if at all. Although many of the parameters we estimated show a gradual, and in some cases marked, change for the better over time, they still paint a somewhat bleak picture of mycological taxonomy as a male-dominated field where the wants and needs of an extended target audience are often not understood or even considered. This study hopes to leave a mark on the way fungal species are described by putting the focus on ways in which fungal taxonomy can better anticipate the end users of species descriptions - be they mycologists, other researchers, the public at large, or even algorithms. In the end, fungal taxonomy, too, is likely to benefit from such measures.
Collapse
Affiliation(s)
- Louisa Durkin
- Department of Biological and Environmental Sciences, Gothenburg Global Biodiversity Centre, University of Gothenburg, Box 461, 405 30 Göteborg, SwedenUniversity of GothenburgGothenburgSweden
| | - Tobias Jansson
- Department of Biological and Environmental Sciences, Gothenburg Global Biodiversity Centre, University of Gothenburg, Box 461, 405 30 Göteborg, SwedenUniversity of GothenburgGothenburgSweden
| | - Marisol Sanchez
- Department of Forest Mycology and Plant Pathology, Uppsala Biocentre, Swedish University of Agricultural Sciences, Uppsala, Swedenwedish University of Agricultural SciencesUppsalaSweden
| | - Maryia Khomich
- Nofima – Norwegian Institute of Food, Fisheries and Aquaculture Research, P.O. Box 210, 1431 Ås, NorwayNorwegian Institute of Food, Fisheries and Aquaculture ResearchOsloNorway
| | - Martin Ryberg
- Department of Organismal Biology, Uppsala University, Uppsala, SwedenUppsala UniversityUppsalaSweden
| | - Erik Kristiansson
- Department of Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, Göteborg, SwedenUniversity of Technology and University of GothenburgGothenburgSweden
| | - R. Henrik Nilsson
- Department of Biological and Environmental Sciences, Gothenburg Global Biodiversity Centre, University of Gothenburg, Box 461, 405 30 Göteborg, SwedenUniversity of GothenburgGothenburgSweden
| |
Collapse
|
4
|
Braun H, Staal J. Stabilization of the TAK1 adaptor proteins TAB2 and TAB3 is critical for optimal NF-κB activation. FEBS J 2020; 287:3161-3164. [PMID: 31997570 DOI: 10.1111/febs.15210] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 01/09/2020] [Indexed: 02/06/2023]
Abstract
TAB2 and TAB3 bind to K63-linked polyubiquitin chains and recruit the critical kinase MAP3K7 (TAK1). The polyubiquitin-recruited TAK1/TAB2/TAB3 complex comes in close proximity with the IKK (IKKα/IKKβ/IKKγ) complex, which is recruited to M1-linked polyubiquitin chains via the IKKγ (NEMO) component. Together, the two complexes activate the NF-κB family of transcription factors. NF-κB transcription factors are critical mediators of pro-inflammatory signals and must be tightly regulated at multiple levels. Recently, it was discovered that one such point of regulation occurs at the level of TAB2 and TAB3 protein stability by the deubiquitinase USP15. Comment on: https://doi.org/10.1111/febs.15202.
Collapse
Affiliation(s)
- Harald Braun
- Department of Biomedical Molecular Biology, Ghent University, Belgium.,Unit of Molecular Signal Transduction in Inflammation, VIB-UGent Center for Inflammation Research, VIB, Ghent, Belgium
| | - Jens Staal
- Department of Biomedical Molecular Biology, Ghent University, Belgium.,Unit of Molecular Signal Transduction in Inflammation, VIB-UGent Center for Inflammation Research, VIB, Ghent, Belgium.,Department of Biochemistry and Microbiology, Ghent University, Belgium
| |
Collapse
|
5
|
Stöver BC, Wiechers S, Müller KF. JPhyloIO: a Java library for event-based reading and writing of different phylogenetic file formats through a common interface. BMC Bioinformatics 2019; 20:402. [PMID: 31331268 PMCID: PMC6647125 DOI: 10.1186/s12859-019-2982-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 07/02/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Today a variety of phylogenetic file formats exists, some of which are well-established but limited in their data model, while other more recently introduced ones offer advanced features for metadata representation. Although most currently available software only supports the classical formats with a limited metadata model, it would be desirable to have support for the more advanced formats. This is necessary for users to produce richly annotated data that can be efficiently reused and make underlying workflows easily reproducible. A programming library that abstracts over the data and metadata models of the different formats and allows supporting all of them in one step would significantly simplify the development of new and the extension of existing software to address the need for better metadata annotation. RESULTS We developed the Java library JPhyloIO, which allows event-based reading and writing of the most common alignment and tree/network formats. It allows full access to all features of the nine currently supported formats. By implementing a single JPhyloIO-based reader and writer, application developers can support all of these formats. Due to the event-based architecture, JPhyloIO can be combined with any application data structure, and is memory efficient for large datasets. JPhyloIO is distributed under LGPL. Detailed documentation and example applications (available on http://bioinfweb.info/JPhyloIO/ ) significantly lower the entry barrier for bioinformaticians who wish to benefit from JPhyloIO's features in their own software. CONCLUSION JPhyloIO enables simplified development of new and extension of existing applications that support various standard formats simultaneously. This has the potential to improve interoperability between phylogenetic software tools and at the same time motivate usage of more recent metadata-rich formats such as NeXML or phyloXML.
Collapse
Affiliation(s)
- Ben C Stöver
- Institute for Evolution and Biodiversity, WWU Münster, Hüfferstraße 1, 48149, Münster, Germany.
| | - Sarah Wiechers
- Institute for Evolution and Biodiversity, WWU Münster, Hüfferstraße 1, 48149, Münster, Germany
| | - Kai F Müller
- Institute for Evolution and Biodiversity, WWU Münster, Hüfferstraße 1, 48149, Münster, Germany
| |
Collapse
|
6
|
Eiserhardt WL, Antonelli A, Bennett DJ, Botigué LR, Burleigh JG, Dodsworth S, Enquist BJ, Forest F, Kim JT, Kozlov AM, Leitch IJ, Maitner BS, Mirarab S, Piel WH, Pérez-Escobar OA, Pokorny L, Rahbek C, Sandel B, Smith SA, Stamatakis A, Vos RA, Warnow T, Baker WJ. A roadmap for global synthesis of the plant tree of life. AMERICAN JOURNAL OF BOTANY 2018; 105:614-622. [PMID: 29603138 DOI: 10.1002/ajb2.1041] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Accepted: 11/08/2017] [Indexed: 06/08/2023]
Abstract
Providing science and society with an integrated, up-to-date, high quality, open, reproducible and sustainable plant tree of life would be a huge service that is now coming within reach. However, synthesizing the growing body of DNA sequence data in the public domain and disseminating the trees to a diverse audience are often not straightforward due to numerous informatics barriers. While big synthetic plant phylogenies are being built, they remain static and become quickly outdated as new data are published and tree-building methods improve. Moreover, the body of existing phylogenetic evidence is hard to navigate and access for non-experts. We propose that our community of botanists, tree builders, and informaticians should converge on a modular framework for data integration and phylogenetic analysis, allowing easy collaboration, updating, data sourcing and flexible analyses. With support from major institutions, this pipeline should be re-run at regular intervals, storing trees and their metadata long-term. Providing the trees to a diverse global audience through user-friendly front ends and application development interfaces should also be a priority. Interactive interfaces could be used to solicit user feedback and thus improve data quality and to coordinate the generation of new data. We conclude by outlining a number of steps that we suggest the scientific community should take to achieve global phylogenetic synthesis.
Collapse
Affiliation(s)
- Wolf L Eiserhardt
- Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
- Department of Bioscience, Aarhus University, Ny Munkegade 116, 8000, Aarhus C, Denmark
| | - Alexandre Antonelli
- Gothenburg Global Biodiversity Centre, Box 461, 405 30, Gothenburg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden
- Gothenburg Botanical Garden, Carl Skottsbergs Gata 22B, SE-413 19, Gothenburg, Sweden
| | - Dominic J Bennett
- Gothenburg Global Biodiversity Centre, Box 461, 405 30, Gothenburg, Sweden
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30, Gothenburg, Sweden
- Gothenburg Botanical Garden, Carl Skottsbergs Gata 22B, SE-413 19, Gothenburg, Sweden
| | | | | | | | - Brian J Enquist
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
- The Santa Fe Institute, Santa Fe, NM, 87501, USA
| | - Félix Forest
- Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
| | - Jan T Kim
- Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
| | - Alexey M Kozlov
- Scientific Computing Group, Heidelberg Institute for Theoretical Studies, 69118, Heidelberg, Germany
| | - Ilia J Leitch
- Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
| | - Brian S Maitner
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ, 85721, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego, San Diego, CA, 92093, USA
| | - William H Piel
- Yale-NUS College, 16 College Avenue West, Singapore, 138527, Republic of Singapore
| | | | - Lisa Pokorny
- Royal Botanic Gardens, Kew, TW9 3AE, Richmond, Surrey, UK
| | - Carsten Rahbek
- Center for Macroecology, Evolution and Climate, University of Copenhagen, Universitetsparken 15, DK-2100, Copenhagen O, Denmark
- Imperial College London, Silwood Park, Buckhurst Road, Ascot, Berkshire, SL5 7PY, UK
| | - Brody Sandel
- Department of Biology, Santa Clara University, Santa Clara, CA, 95053, USA
| | - Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Alexandros Stamatakis
- Scientific Computing Group, Heidelberg Institute for Theoretical Studies, 69118, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, 76128, Karlsruhe, Germany
| | - Rutger A Vos
- Naturalis Biodiversity Center, P.O. Box 9517, 2300RA, Leiden, The Netherlands
- Institute of Biology Leiden, P.O. Box 9505, 2300RA, Leiden, The Netherlands
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | | |
Collapse
|
7
|
Griffiths E, Dooley D, Graham M, Van Domselaar G, Brinkman FSL, Hsiao WWL. Context Is Everything: Harmonization of Critical Food Microbiology Descriptors and Metadata for Improved Food Safety and Surveillance. Front Microbiol 2017; 8:1068. [PMID: 28694792 PMCID: PMC5483436 DOI: 10.3389/fmicb.2017.01068] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 05/29/2017] [Indexed: 11/18/2022] Open
Abstract
Globalization of food networks increases opportunities for the spread of foodborne pathogens beyond borders and jurisdictions. High resolution whole-genome sequencing (WGS) subtyping of pathogens promises to vastly improve our ability to track and control foodborne disease, but to do so it must be combined with epidemiological, clinical, laboratory and other health care data (called “contextual data”) to be meaningfully interpreted for regulatory and health interventions, outbreak investigation, and risk assessment. However, current multi-jurisdictional pathogen surveillance and investigation efforts are complicated by time-consuming data re-entry, curation and integration of contextual information owing to a lack of interoperable standards and inconsistent reporting. A solution to these challenges is the use of ‘ontologies’ - hierarchies of well-defined and standardized vocabularies interconnected by logical relationships. Terms are specified by universal IDs enabling integration into highly regulated areas and multi-sector sharing (e.g., food and water microbiology with the veterinary sector). Institution-specific terms can be mapped to a given standard at different levels of granularity, maximizing comparability of contextual information according to jurisdictional policies. Fit-for-purpose ontologies provide contextual information with the auditability required for food safety laboratory accreditation. Our research efforts include the development of a Genomic Epidemiology Ontology (GenEpiO), and Food Ontology (FoodOn) that harmonize important laboratory, clinical and epidemiological data fields, as well as existing food resources. These efforts are supported by a global consortium of researchers and stakeholders worldwide. Since foodborne diseases do not respect international borders, uptake of such vocabularies will be crucial for multi-jurisdictional interpretation of WGS results and data sharing.
Collapse
Affiliation(s)
- Emma Griffiths
- Department of Molecular Biology and Biochemistry, Simon Fraser University, VancouverBC, Canada
| | - Damion Dooley
- Department of Pathology and Laboratory Medicine, University of British Columbia, VancouverBC, Canada
| | - Morag Graham
- National Microbiology Laboratory, Public Health Agency of Canada, WinnipegMB, Canada.,Department of Medical Microbiology and Infectious Diseases, Max Rady College of Medicine, University of Manitoba, WinnipegMB, Canada
| | - Gary Van Domselaar
- National Microbiology Laboratory, Public Health Agency of Canada, WinnipegMB, Canada.,Department of Medical Microbiology and Infectious Diseases, Max Rady College of Medicine, University of Manitoba, WinnipegMB, Canada
| | - Fiona S L Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, VancouverBC, Canada
| | - William W L Hsiao
- Department of Pathology and Laboratory Medicine, University of British Columbia, VancouverBC, Canada.,British Columbia Centre for Disease Control Public Health Laboratory, VancouverBC, Canada
| |
Collapse
|
8
|
Höhna S, Landis MJ, Heath TA, Boussau B, Lartillot N, Moore BR, Huelsenbeck JP, Ronquist F. RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language. Syst Biol 2016; 65:726-36. [PMID: 27235697 PMCID: PMC4911942 DOI: 10.1093/sysbio/syw021] [Citation(s) in RCA: 333] [Impact Index Per Article: 41.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Revised: 03/02/2015] [Accepted: 03/01/2015] [Indexed: 01/12/2023] Open
Abstract
Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.].
Collapse
Affiliation(s)
- Sebastian Höhna
- Department of Integrative Biology; Department of Statistics, University of California, Berkeley, CA 94720, USA; Department of Evolution and Ecology, University of California, Davis, CA 95616, USA; Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden;
| | | | - Tracy A Heath
- Department of Integrative Biology; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA 50011, USA;
| | - Bastien Boussau
- Department of Integrative Biology; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; and
| | - Nicolas Lartillot
- Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; and
| | - Brian R Moore
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA;
| | | | - Fredrik Ronquist
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden
| |
Collapse
|
9
|
Hinchliff CE, Smith SA, Allman JF, Burleigh JG, Chaudhary R, Coghill LM, Crandall KA, Deng J, Drew BT, Gazis R, Gude K, Hibbett DS, Katz LA, Laughinghouse HD, McTavish EJ, Midford PE, Owen CL, Ree RH, Rees JA, Soltis DE, Williams T, Cranston KA. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proc Natl Acad Sci U S A 2015; 112:12764-9. [PMID: 26385966 PMCID: PMC4611642 DOI: 10.1073/pnas.1423041112] [Citation(s) in RCA: 372] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Reconstructing the phylogenetic relationships that unite all lineages (the tree of life) is a grand challenge. The paucity of homologous character data across disparately related lineages currently renders direct phylogenetic inference untenable. To reconstruct a comprehensive tree of life, we therefore synthesized published phylogenies, together with taxonomic classifications for taxa never incorporated into a phylogeny. We present a draft tree containing 2.3 million tips-the Open Tree of Life. Realization of this tree required the assembly of two additional community resources: (i) a comprehensive global reference taxonomy and (ii) a database of published phylogenetic trees mapped to this taxonomy. Our open source framework facilitates community comment and contribution, enabling the tree to be continuously updated when new phylogenetic and taxonomic data become digitally available. Although data coverage and phylogenetic conflict across the Open Tree of Life illuminate gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point for community contribution. This comprehensive tree will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change, agriculture, and genomics.
Collapse
Affiliation(s)
- Cody E Hinchliff
- Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109
| | - Stephen A Smith
- Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109;
| | | | | | - Ruchi Chaudhary
- Department of Biology, University of Florida, Gainesville, FL 32611
| | | | - Keith A Crandall
- Computational Biology Institute, George Washington University, Ashburn, VA 20147
| | - Jiabin Deng
- Department of Biology, University of Florida, Gainesville, FL 32611
| | - Bryan T Drew
- Department of Biology, University of Nebraska-Kearney, Kearney, NE 68849
| | - Romina Gazis
- Department of Biology, Clark University, Worcester, MA 01610
| | - Karl Gude
- School of Journalism, Michigan State University, East Lansing, MI 48824
| | - David S Hibbett
- Department of Biology, Clark University, Worcester, MA 01610
| | - Laura A Katz
- Biological Science, Clark Science Center, Smith College, Northampton, MA 01063
| | | | - Emily Jane McTavish
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045
| | | | | | | | - Jonathan A Rees
- National Evolutionary Synthesis Center, Duke University, Durham, NC 27705
| | - Douglas E Soltis
- Department of Biology, University of Florida, Gainesville, FL 32611; Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
| | - Tiffani Williams
- Computer Science and Engineering, Texas A&M University, College Station, TX 77843
| | - Karen A Cranston
- National Evolutionary Synthesis Center, Duke University, Durham, NC 27705;
| |
Collapse
|
10
|
ReproPhylo: An Environment for Reproducible Phylogenomics. PLoS Comput Biol 2015; 11:e1004447. [PMID: 26335558 PMCID: PMC4559436 DOI: 10.1371/journal.pcbi.1004447] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 07/13/2015] [Indexed: 11/19/2022] Open
Abstract
The reproducibility of experiments is key to the scientific process, and particularly necessary for accurate reporting of analyses in data-rich fields such as phylogenomics. We present ReproPhylo, a phylogenomic analysis environment developed to ensure experimental reproducibility, to facilitate the handling of large-scale data, and to assist methodological experimentation. Reproducibility, and instantaneous repeatability, is built in to the ReproPhylo system and does not require user intervention or configuration because it stores the experimental workflow as a single, serialized Python object containing explicit provenance and environment information. This ‘single file’ approach ensures the persistence of provenance across iterations of the analysis, with changes automatically managed by the version control program Git. This file, along with a Git repository, are the primary reproducibility outputs of the program. In addition, ReproPhylo produces an extensive human-readable report and generates a comprehensive experimental archive file, both of which are suitable for submission with publications. The system facilitates thorough experimental exploration of both parameters and data. ReproPhylo is a platform independent CC0 Python module and is easily installed as a Docker image or a WinPython self-sufficient package, with a Jupyter Notebook GUI, or as a slimmer version in a Galaxy distribution.
Collapse
|
11
|
Pope LC, Liggins L, Keyse J, Carvalho SB, Riginos C. Not the time or the place: the missing spatio-temporal link in publicly available genetic data. Mol Ecol 2015; 24:3802-9. [DOI: 10.1111/mec.13254] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2015] [Revised: 05/07/2015] [Accepted: 05/22/2015] [Indexed: 11/29/2022]
Affiliation(s)
- Lisa C. Pope
- School of Biological Sciences; The University of Queensland; Brisbane Qld 4072 Australia
| | - Libby Liggins
- Allan Wilson Centre for Molecular Ecology and Evolution; New Zealand Institute for Advanced Study; Institute of Natural and Mathematical Sciences; Massey University; Auckland 0745 New Zealand
- Auckland War Memorial Museum; Tāmaki Paenga Hira; Auckland 1142 New Zealand
| | - Jude Keyse
- School of Biological Sciences; The University of Queensland; Brisbane Qld 4072 Australia
| | - Silvia B Carvalho
- CIBIO/InBIO - Centro de Investigação em Biodiversidade e Recursos Genéticos da Universidade do Porto; R. Padre Armando Quintas 4485-661 Vairão Portugal
| | - Cynthia Riginos
- School of Biological Sciences; The University of Queensland; Brisbane Qld 4072 Australia
| |
Collapse
|
12
|
Magee AF, May MR, Moore BR. The dawn of open access to phylogenetic data. PLoS One 2014; 9:e110268. [PMID: 25343725 PMCID: PMC4208793 DOI: 10.1371/journal.pone.0110268] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 09/09/2014] [Indexed: 01/29/2023] Open
Abstract
The scientific enterprise depends critically on the preservation of and open access to published data. This basic tenet applies acutely to phylogenies (estimates of evolutionary relationships among species). Increasingly, phylogenies are estimated from increasingly large, genome-scale datasets using increasingly complex statistical methods that require increasing levels of expertise and computational investment. Moreover, the resulting phylogenetic data provide an explicit historical perspective that critically informs research in a vast and growing number of scientific disciplines. One such use is the study of changes in rates of lineage diversification (speciation--extinction) through time. As part of a meta-analysis in this area, we sought to collect phylogenetic data (comprising nucleotide sequence alignment and tree files) from 217 studies published in 46 journals over a 13-year period. We document our attempts to procure those data (from online archives and by direct request to corresponding authors), and report results of analyses (using Bayesian logistic regression) to assess the impact of various factors on the success of our efforts. Overall, complete phylogenetic data for [Formula: see text] of these studies are effectively lost to science. Our study indicates that phylogenetic data are more likely to be deposited in online archives and/or shared upon request when: (1) the publishing journal has a strong data-sharing policy; (2) the publishing journal has a higher impact factor, and; (3) the data are requested from faculty rather than students. Importantly, our survey spans recent policy initiatives and infrastructural changes; our analyses indicate that the positive impact of these community initiatives has been both dramatic and immediate. Although the results of our study indicate that the situation is dire, our findings also reveal tremendous recent progress in the sharing and preservation of phylogenetic data.
Collapse
Affiliation(s)
- Andrew F. Magee
- Department of Evolution and Ecology, University of California Davis, Davis, CA, United States of America
| | - Michael R. May
- Department of Evolution and Ecology, University of California Davis, Davis, CA, United States of America
| | - Brian R. Moore
- Department of Evolution and Ecology, University of California Davis, Davis, CA, United States of America
| |
Collapse
|
13
|
Flores O, Garnier E, Wright IJ, Reich PB, Pierce S, Dìaz S, Pakeman RJ, Rusch GM, Bernard-Verdier M, Testi B, Bakker JP, Bekker RM, Cerabolini BEL, Ceriani RM, Cornu G, Cruz P, Delcamp M, Dolezal J, Eriksson O, Fayolle A, Freitas H, Golodets C, Gourlet-Fleury S, Hodgson JG, Brusa G, Kleyer M, Kunzmann D, Lavorel S, Papanastasis VP, Pérez-Harguindeguy N, Vendramini F, Weiher E. An evolutionary perspective on leaf economics: phylogenetics of leaf mass per area in vascular plants. Ecol Evol 2014; 4:2799-811. [PMID: 25165520 PMCID: PMC4130440 DOI: 10.1002/ece3.1087] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Revised: 03/28/2014] [Accepted: 04/02/2014] [Indexed: 01/20/2023] Open
Abstract
In plant leaves, resource use follows a trade-off between rapid resource capture and conservative storage. This "worldwide leaf economics spectrum" consists of a suite of intercorrelated leaf traits, among which leaf mass per area, LMA, is one of the most fundamental as it indicates the cost of leaf construction and light-interception borne by plants. We conducted a broad-scale analysis of the evolutionary history of LMA across a large dataset of 5401 vascular plant species. The phylogenetic signal in LMA displayed low but significant conservatism, that is, leaf economics tended to be more similar among close relatives than expected by chance alone. Models of trait evolution indicated that LMA evolved under weak stabilizing selection. Moreover, results suggest that different optimal phenotypes evolved among large clades within which extremes tended to be selected against. Conservatism in LMA was strongly related to growth form, as were selection intensity and phenotypic evolutionary rates: woody plants showed higher conservatism in relation to stronger stabilizing selection and lower evolutionary rates compared to herbaceous taxa. The evolutionary history of LMA thus paints different evolutionary trajectories of vascular plant species across clades, revealing the coordination of leaf trait evolution with growth forms in response to varying selection regimes.
Collapse
Affiliation(s)
- Olivier Flores
- CNRS, Centre d'Écologie Fonctionnelle et Évolutive (CEFE), UMR 51751919 route de Mende, 34293, Montpellier Cedex 5, France
- UMR PVMBT, Université de la Réunion, CIRAD7 chemin de l'IRAT, 94710, Saint–Pierre, France
| | - Eric Garnier
- CNRS, Centre d'Écologie Fonctionnelle et Évolutive (CEFE), UMR 51751919 route de Mende, 34293, Montpellier Cedex 5, France
| | - Ian J Wright
- Department of Biological Sciences, Macquarie UniversityNew South Wales, 2109, Australia
| | - Peter B Reich
- Department of Forest Resources and Institute on the Environment, University of MinnesotaSt Paul, Minnesota
- Hawkesbury Institute for the Environment, University of Western SydneyHawkesbury, New South Wales, Australia
| | - Simon Pierce
- Department of Plant Production, University of Milanvia Celoria 2, I-20133, Milan, Italy
| | - Sandra Dìaz
- Instituto Multidisciplinario de Biología Vegetal (CONICET - UNC) and FCEFyN, Universidad Nacional de CórdobaCasilla de Correo 495, Vélez Sársfield 299, 5000, Córdoba, Argentina
| | - Robin J Pakeman
- James Hutton InstituteCraigiebuckler, Aberdeen, AB15 8QH, UK
| | - Graciela M Rusch
- Norwegian Institute for Nature ResearchTungasletta 2, 7485, Trondheim, Norway
| | - Maud Bernard-Verdier
- CNRS, Centre d'Écologie Fonctionnelle et Évolutive (CEFE), UMR 51751919 route de Mende, 34293, Montpellier Cedex 5, France
| | - Baptiste Testi
- CNRS, Centre d'Écologie Fonctionnelle et Évolutive (CEFE), UMR 51751919 route de Mende, 34293, Montpellier Cedex 5, France
| | - Jan P Bakker
- Community and Conservation Ecology GroupPO Box 14, 9750, AA Haren, The Netherlands
| | - Renée M Bekker
- Community and Conservation Ecology GroupPO Box 14, 9750, AA Haren, The Netherlands
| | - Bruno E L Cerabolini
- DBSF, Università degli Studi dell'InsubriaVia J.H. Dunant 3, I- 21100, Varese, Italy
| | - Roberta M Ceriani
- Centro Flora Autoctona, c/o Consorzio Parco Monte Barrovia Bertarelli 11, I-23851, Galbiate (LC), Italy
| | - Guillaume Cornu
- UR B&SEF CIRAD, TA C-105/D, Campus International de Baillarguet34398, Montpellier Cedex 5, France
| | - Pablo Cruz
- INRA UMR 1248 AGIR, Equipe ORPHEEBP 52627 - Auzeville, 31326, Castanet-Tolosan, France
| | - Matthieu Delcamp
- UR B&SEF CIRAD, TA C-105/D, Campus International de Baillarguet34398, Montpellier Cedex 5, France
| | - Jiri Dolezal
- Institute of Botany, Academy of Sciences of the Czech RepublicDukelská 135, CZ-37982, Třeboň, Czech Republic
| | - Ove Eriksson
- Department of Botany, Stockholm UniversityStockholm, 106 91, Sweden
| | - Adeline Fayolle
- UR B&SEF CIRAD, TA C-105/D, Campus International de Baillarguet34398, Montpellier Cedex 5, France
| | - Helena Freitas
- Centre for Functional Ecology, University of CoimbraCoimbra, Portugal
| | - Carly Golodets
- Department of Molecular Biology and Ecology of Plants, Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, 69978, Israel
| | - Sylvie Gourlet-Fleury
- UR B&SEF CIRAD, TA C-105/D, Campus International de Baillarguet34398, Montpellier Cedex 5, France
| | - John G Hodgson
- Department of Archaeology, The UniversitySheffield, S1 4ET, UK
| | - Guido Brusa
- DBSF, Università degli Studi dell'InsubriaVia J.H. Dunant 3, I- 21100, Varese, Italy
| | - Michael Kleyer
- Landscape Ecology Group, Carl von Ossietzky University of OldenburgP.O. Box 2503, 26111, Oldenburg, Germany
| | - Dieter Kunzmann
- Landscape Ecology Group, Carl von Ossietzky University of OldenburgP.O. Box 2503, 26111, Oldenburg, Germany
- Landscape Ecology & ConsultingLerchenstrasse 20, 26215, Wiefelstede, Germany
| | - Sandra Lavorel
- Laboratoire d'Écologie Alpine (CNRS UMR 5553) and Station Alpine Joseph Fourier (UMS-UJF-CNRS 2925), Université Joseph FourierBP 53, F-38042, Grenoble, Cedex 09, France
| | - Vasilios P Papanastasis
- Laboratory of Rangeland Ecology, Aristotle University of Thessaloniki54124, Thessaloniki, Greece
| | - Natalia Pérez-Harguindeguy
- Instituto Multidisciplinario de Biología Vegetal (CONICET - UNC) and FCEFyN, Universidad Nacional de CórdobaCasilla de Correo 495, Vélez Sársfield 299, 5000, Córdoba, Argentina
| | - Fernanda Vendramini
- Instituto Multidisciplinario de Biología Vegetal (CONICET - UNC) and FCEFyN, Universidad Nacional de CórdobaCasilla de Correo 495, Vélez Sársfield 299, 5000, Córdoba, Argentina
| | - Evan Weiher
- Department of Biology, University of Wisconsin-Eau ClairePhillips Hall 353, Eau Claire, Wisconsin, 4702-4004
| |
Collapse
|
14
|
Cranston K, Harmon LJ, O'Leary MA, Lisle C. Best practices for data sharing in phylogenetic research. PLOS CURRENTS 2014; 6:ecurrents.tol.bf01eff4a6b60ca4825c69293dc59645. [PMID: 24987572 PMCID: PMC4073804 DOI: 10.1371/currents.tol.bf01eff4a6b60ca4825c69293dc59645] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
As phylogenetic data becomes increasingly available, along with associated data on species' genomes, traits, and geographic distributions, the need to ensure data availability and reuse become more and more acute. In this paper, we provide ten "simple rules" that we view as best practices for data sharing in phylogenetic research. These rules will help lead towards a future phylogenetics where data can easily be archived, shared, reused, and repurposed across a wide variety of projects.
Collapse
Affiliation(s)
- Karen Cranston
- National Evolutionary Synthesis Center, Duke University, Durham, North Carolina, USA
| | - Luke J Harmon
- Department of Biological Sciences, University of Idaho, Moscow, Idaho, USA
| | - Maureen A O'Leary
- Department of Anatomical Sciences, Stony Brook University, Stonybrook, New York, USA
| | | |
Collapse
|
15
|
Southan C, Hancock JM. A tale of two drug targets: the evolutionary history of BACE1 and BACE2. Front Genet 2013; 4:293. [PMID: 24381583 PMCID: PMC3865767 DOI: 10.3389/fgene.2013.00293] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Accepted: 11/29/2013] [Indexed: 11/22/2022] Open
Abstract
The beta amyloid (APP) cleaving enzyme (BACE1) has been a drug target for Alzheimer's Disease (AD) since 1999 with lead inhibitors now entering clinical trials. In 2011, the paralog, BACE2, became a new target for type II diabetes (T2DM) having been identified as a TMEM27 secretase regulating pancreatic β cell function. However, the normal roles of both enzymes are unclear. This study outlines their evolutionary history and new opportunities for functional genomics. We identified 30 homologs (UrBACEs) in basal phyla including Placozoans, Cnidarians, Choanoflagellates, Porifera, Echinoderms, Annelids, Mollusks and Ascidians (but not Ecdysozoans). UrBACEs are predominantly single copy, show 35-45% protein sequence identity with mammalian BACE1, are ~100 residues longer than cathepsin paralogs with an aspartyl protease domain flanked by a signal peptide and a C-terminal transmembrane domain. While multiple paralogs in Trichoplax and Monosiga pre-date the nervous system, duplication of the UrBACE in fish gave rise to BACE1 and BACE2 in the vertebrate lineage. The latter evolved more rapidly as the former maintained the emergent neuronal role. In mammals, Ka/Ks for BACE2 is higher than BACE1 but low ratios for both suggest purifying selection. The 5' exons show higher Ka/Ks than the catalytic section. Model organism genomes show the absence of certain BACE human substrates when the UrBACE is present. Experiments could thus reveal undiscovered substrates and roles. The human protease double-target status means that evolutionary trajectories and functional shifts associated with different substrates will have implications for the development of clinical candidates for both AD and T2DM. A rational basis for inhibition specificity ratios and assessing target-related side effects will be facilitated by a more complete picture of BACE1 and BACE2 functions informed by their evolutionary context.
Collapse
Affiliation(s)
- Christopher Southan
- IUPHAR Database and Guide to Pharmacology Web Portal Group, University/BHF Centre for Cardiovascular Science, Queen's Medical Research Institute, University of EdinburghEdinburgh, UK
| | - John M. Hancock
- Department of Physiology, Development and Neuroscience, University of CambridgeCambridge, UK
| |
Collapse
|
16
|
Panahiazar M, Sheth AP, Ranabahu A, Vos RA, Leebens-Mack J. Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach. BMC Med Genomics 2013; 6 Suppl 3:S5. [PMID: 24565381 PMCID: PMC3980757 DOI: 10.1186/1755-8794-6-s3-s5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena, including, for example, the importance of gene and genome duplications in the evolution of gene function, the role of adaptation as a driver of diversification, or the evolutionary consequences of biogeographic shifts. Phyloinformaticists are developing data standards, databases and communication protocols (e.g. Application Programming Interfaces, APIs) to extend the accessibility of gene trees, species trees, and the metadata necessary to interpret these trees, thus enabling researchers across the life sciences to reuse phylogenetic knowledge. Specifically, Semantic Web technologies are being developed to make phylogenetic knowledge interpretable by web agents, thereby enabling intelligently automated, high-throughput reuse of results generated by phylogenetic research. This manuscript describes an ontology-driven, semantic problem-solving environment for phylogenetic analyses and introduces artefacts that can promote phyloinformatic efforts to promote accessibility of trees and underlying metadata. PhylOnt is an extensible ontology with concepts describing tree types and tree building methodologies including estimation methods, models and programs. In addition we present the PhylAnt platform for annotating scientific articles and NeXML files with PhylOnt concepts. The novelty of this work is the annotation of NeXML files and phylogenetic related documents with PhylOnt Ontology. This approach advances data reuse in phyloinformatics.
Collapse
|
17
|
Vogt L. eScience and the need for data standards in the life sciences: in pursuit of objectivity rather than truth. SYST BIODIVERS 2013. [DOI: 10.1080/14772000.2013.818588] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
18
|
Stoltzfus A, Lapp H, Matasci N, Deus H, Sidlauskas B, Zmasek CM, Vaidya G, Pontelli E, Cranston K, Vos R, Webb CO, Harmon LJ, Pirrung M, O'Meara B, Pennell MW, Mirarab S, Rosenberg MS, Balhoff JP, Bik HM, Heath TA, Midford PE, Brown JW, McTavish EJ, Sukumaran J, Westneat M, Alfaro ME, Steele A, Jordan G. Phylotastic! Making tree-of-life knowledge accessible, reusable and convenient. BMC Bioinformatics 2013; 14:158. [PMID: 23668630 PMCID: PMC3669619 DOI: 10.1186/1471-2105-14-158] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Accepted: 04/30/2013] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user's needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. RESULTS With the aim of building such a "phylotastic" system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (http://www.phylotastic.org), and a server image. CONCLUSIONS Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.
Collapse
Affiliation(s)
- Arlin Stoltzfus
- Institute for Bioscience and Biotechnology Research (IBBR), Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
| | - Hilmar Lapp
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Naim Matasci
- The iPlant Collaborative and EEB Department, University of Arizona, 1657 E Helen St, Tucson, AZ, 85721, USA
| | - Helena Deus
- Digital Enterprise Research Institute, National University of Ireland, University Road, Galway, Ireland
| | - Brian Sidlauskas
- Department of Fisheries and Wildlife, Oregon State University, 104 Nash Hall, Corvallis, OR, 97331-3803, USA
| | - Christian M Zmasek
- Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Gaurav Vaidya
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, 80309-0334, USA
| | - Enrico Pontelli
- Department of Computer Science, New Mexico State University, MSC CS, Box 30001, Las Cruces, NM, 88003, USA
| | - Karen Cranston
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Rutger Vos
- NCB Naturalis, Einsteinweg 2, Leiden, 2333 CC, the Netherlands
| | - Campbell O Webb
- Arnold Arboretum of Harvard University, Boston, MA, 02130, USA
| | - Luke J Harmon
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, PO Box 443051, Moscow, ID, 83844-3051, USA
| | - Megan Pirrung
- University of Colorado Denver Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Brian O'Meara
- Department of Ecology & Evolutionary Biology, 569 Dabney Hall, University of Tennessee, Knoxville, TN, 37996, USA
| | - Matthew W Pennell
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, PO Box 443051, Moscow, ID, 83844-3051, USA
| | - Siavash Mirarab
- Department of Computer Science, University of Texas at Austin, Austin, TX, 78701, USA
| | - Michael S Rosenberg
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, and School of Life Sciences, Arizona State University, PO Box 874501, Tempe, AZ, 85287-4501, USA
| | - James P Balhoff
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Holly M Bik
- UC Davis Genome Center, One Shields Ave, Davis, CA, 95618, USA
| | - Tracy A Heath
- Department of Integrative Biology, University of California, Berkeley, CA, 94720-3140, USA
| | - Peter E Midford
- National Evolutionary Synthesis Center, 2024 W. Main St, Durham, NC, 27705, USA
| | - Joseph W Brown
- Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, PO Box 443051, Moscow, ID, 83844-3051, USA
| | | | - Jeet Sukumaran
- Biology Department, Duke University, Biological Sciences Building, 125 Science Drive, Durham, NC, 27708, USA
| | - Mark Westneat
- Biodiversity Synthesis Center, Field Museum of Natural History, 1400 S Lakeshore Dr, Chicago, IL, 60605, USA
| | - Michael E Alfaro
- Department of Ecology and Evolutionary Biology, South University of California Los Angeles, 621 Charles E. Young Dr, Los Angeles, CA, 90095, USA
| | - Aaron Steele
- U.C. Berkeley Museum of Vertebrate Zoology, University of California, 3101 Valley Life Sciences Building, Berkeley, CA, 94720, USA
| | - Greg Jordan
- Paperpile, 34 Houghton Street, Somerville, MA, 02143, USA
| |
Collapse
|
19
|
Stoltzfus A, O'Meara B, Whitacre J, Mounce R, Gillespie EL, Kumar S, Rosauer DF, Vos RA. Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis. BMC Res Notes 2012; 5:574. [PMID: 23088596 PMCID: PMC3583491 DOI: 10.1186/1756-0500-5-574] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Accepted: 08/24/2012] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use. FINDINGS Here we summarize results of an ongoing analysis of current practices for archiving phylogenetic trees and associated data, current practices of re-use, and current barriers to re-use. We find that the technical infrastructure is available to support rudimentary archiving, but the frequency of archiving is low. Currently, most phylogenetic knowledge is not easily re-used due to a lack of archiving, lack of awareness of best practices, and lack of community-wide standards for formatting data, naming entities, and annotating data. Most attempts at data re-use seem to end in disappointment. Nevertheless, we find many positive examples of data re-use, particularly those that involve customized species trees generated by grafting to, and pruning from, a much larger tree. CONCLUSIONS The technologies and practices that facilitate data re-use can catalyze synthetic and integrative research. However, success will require engagement from various stakeholders including individual scientists who produce or consume shareable data, publishers, policy-makers, technology developers and resource-providers. The critical challenges for facilitating re-use of phylogenetic trees and associated data, we suggest, include: a broader commitment to public archiving; more extensive use of globally meaningful identifiers; development of user-friendly technology for annotating, submitting, searching, and retrieving data and their metadata; and development of a minimum reporting standard (MIAPA) indicating which kinds of data and metadata are most important for a re-useable phylogenetic record.
Collapse
Affiliation(s)
- Arlin Stoltzfus
- Biochemical Science Division, NIST, 100 Bureau Drive, Gaithersburg, MD, USA
| | - Brian O'Meara
- Department of Ecology & Evolutionary Biology, University of Tennessee, 569 Dabney Hall, Knoxville, TN, 37996-1610, USA
| | - Jamie Whitacre
- NMNH, Smithsonian Institution, Washington, DC, 20013-7012, USA
| | - Ross Mounce
- Department of Biology and Biochemistry, University of Bath, Bath, UK
| | | | - Sudhir Kumar
- Center for Evolutionary Medicine and Informatics, Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, AZ, 85287-5301, USA
| | - Dan F Rosauer
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| | - Rutger A Vos
- NCB Naturalis, Einsteinweg 2, 2333 CC, Leiden, the Netherlands
| |
Collapse
|
20
|
Krishnan NM, Pattnaik S, Jain P, Gaur P, Choudhary R, Vaidyanathan S, Deepak S, Hariharan AK, Krishna PB, Nair J, Varghese L, Valivarthi NK, Dhas K, Ramaswamy K, Panda B. A draft of the genome and four transcriptomes of a medicinal and pesticidal angiosperm Azadirachta indica. BMC Genomics 2012; 13:464. [PMID: 22958331 PMCID: PMC3507787 DOI: 10.1186/1471-2164-13-464] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Accepted: 09/03/2012] [Indexed: 12/05/2022] Open
Abstract
Background The Azadirachta indica (neem) tree is a source of a wide number of natural products, including the potent biopesticide azadirachtin. In spite of its widespread applications in agriculture and medicine, the molecular aspects of the biosynthesis of neem terpenoids remain largely unexplored. The current report describes the draft genome and four transcriptomes of A. indica and attempts to contextualise the sequence information in terms of its molecular phylogeny, transcript expression and terpenoid biosynthesis pathways. A. indica is the first member of the family Meliaceae to be sequenced using next generation sequencing approach. Results The genome and transcriptomes of A. indica were sequenced using multiple sequencing platforms and libraries. The A. indica genome is AT-rich, bears few repetitive DNA elements and comprises about 20,000 genes. The molecular phylogenetic analyses grouped A. indica together with Citrus sinensis from the Rutaceae family validating its conventional taxonomic classification. Comparative transcript expression analysis showed either exclusive or enhanced expression of known genes involved in neem terpenoid biosynthesis pathways compared to other sequenced angiosperms. Genome and transcriptome analyses in A. indica led to the identification of repeat elements, nucleotide composition and expression profiles of genes in various organs. Conclusions This study on A. indica genome and transcriptomes will provide a model for characterization of metabolic pathways involved in synthesis of bioactive compounds, comparative evolutionary studies among various Meliaceae family members and help annotate their genomes. A better understanding of molecular pathways involved in the azadirachtin synthesis in A. indica will pave ways for bulk production of environment friendly biopesticides.
Collapse
Affiliation(s)
- Neeraja M Krishnan
- Ganit Labs, Bio-IT Centre, Institute of Bioinformatics and Applied Biotechnology, Biotech Park, Electronic City Phase I, Bangalore 560100, India
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Talevich E, Invergo BM, Cock PJA, Chapman BA. Bio.Phylo: a unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython. BMC Bioinformatics 2012; 13:209. [PMID: 22909249 PMCID: PMC3468381 DOI: 10.1186/1471-2105-13-209] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Accepted: 08/08/2012] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Ongoing innovation in phylogenetics and evolutionary biology has been accompanied by a proliferation of software tools, data formats, analytical techniques and web servers. This brings with it the challenge of integrating phylogenetic and other related biological data found in a wide variety of formats, and underlines the need for reusable software that can read, manipulate and transform this information into the various forms required to build computational pipelines. RESULTS We built a Python software library for working with phylogenetic data that is tightly integrated with Biopython, a broad-ranging toolkit for computational biology. Our library, Bio.Phylo, is highly interoperable with existing libraries, tools and standards, and is capable of parsing common file formats for phylogenetic trees, performing basic transformations and manipulations, attaching rich annotations, and visualizing trees. We unified the modules for working with the standard file formats Newick, NEXUS and phyloXML behind a consistent and simple API, providing a common set of functionality independent of the data source. CONCLUSIONS Bio.Phylo meets a growing need in bioinformatics for working with heterogeneous types of phylogenetic data. By supporting interoperability with multiple file formats and leveraging existing Biopython features, this library simplifies the construction of phylogenetic workflows. We also provide examples of the benefits of building a community around a shared open-source project. Bio.Phylo is included with Biopython, available through the Biopython website, http://biopython.org.
Collapse
Affiliation(s)
- Eric Talevich
- Institute of Bioinformatics, University of Georgia, 120 Green Street, Athens, GA 30602, USA
| | - Brandon M Invergo
- Institute of Evolutionary Biology (CSIC-UPF), CEXS-UPF-PRBB, C/ Doctor Aiguader 88, 08003 Barcelona, Spain
| | - Peter JA Cock
- James Hutton Institute, InvergowrieDundee DD2 5DA, UK
| | - Brad A Chapman
- Harvard School of Public Health Bioinformatics Core, 655 Huntington Ave, Boston, MA 02115, USA
| |
Collapse
|
22
|
Vos RA, Balhoff JP, Caravas JA, Holder MT, Lapp H, Maddison WP, Midford PE, Priyam A, Sukumaran J, Xia X, Stoltzfus A. NeXML: rich, extensible, and verifiable representation of comparative data and metadata. Syst Biol 2012; 61:675-89. [PMID: 22357728 PMCID: PMC3376374 DOI: 10.1093/sysbio/sys025] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2011] [Revised: 07/29/2011] [Accepted: 02/07/2012] [Indexed: 12/13/2022] Open
Abstract
In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input-output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.
Collapse
|
23
|
Daly M, Herendeen PS, Guralnick RP, Westneat MW, McDade L. Systematics Agenda 2020: the mission evolves. Syst Biol 2012; 61:549-52. [PMID: 22492540 PMCID: PMC3376376 DOI: 10.1093/sysbio/sys044] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Affiliation(s)
- Marymegan Daly
- Department of Evolution, Ecology and Organismal Biology, The Ohio State University, 1315 Kinnear Road, Columbus, OH 43212, USA.
| | | | | | | | | |
Collapse
|
24
|
Laubach T, von Haeseler A, Lercher MJ. TreeSnatcher plus: capturing phylogenetic trees from images. BMC Bioinformatics 2012; 13:110. [PMID: 22624611 PMCID: PMC3411374 DOI: 10.1186/1471-2105-13-110] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Accepted: 05/24/2012] [Indexed: 11/26/2022] Open
Abstract
Background Figures of phylogenetic trees are widely used to illustrate the result of evolutionary analyses. However, one cannot easily extract a machine-readable representation from such images. Therefore, new software emerges that helps to preserve phylogenies digitally for future research. Results TreeSnatcher Plus is a GUI-driven JAVA application that semi-automatically generates a Newick format for multifurcating, arbitrarily shaped, phylogenetic trees contained in pixel images. It offers a range of image pre-processing methods and detects the topology of a depicted tree with adequate user assistance. The user supervises the recognition process, makes corrections to the image and to the topology and repeats steps if necessary. At the end TreeSnatcher Plus produces a Newick tree code optionally including branch lengths for rectangular and freeform trees. Conclusions Although illustrations of phylogenies exist in a vast number of styles, TreeSnatcher Plus imposes no limitations on the images it can process with adequate user assistance. Given that a fully automated digitization of all figures of phylogenetic trees is desirable but currently unrealistic, TreeSnatcher Plus is the only program that reliably facilitates at least a semi-automatic conversion from such figures into a machine-readable format.
Collapse
Affiliation(s)
- Thomas Laubach
- Department of Bioinformatics, Heinrich-Heine-University Duesseldorf, Universitaetsstrasse 1, Duesseldorf 40225, Germany.
| | | | | |
Collapse
|
25
|
Parr CS, Guralnick R, Cellinese N, Page RD. Evolutionary informatics: unifying knowledge about the diversity of life. Trends Ecol Evol 2012; 27:94-103. [PMID: 22154516 DOI: 10.1016/j.tree.2011.11.001] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2011] [Revised: 10/31/2011] [Accepted: 11/01/2011] [Indexed: 01/23/2023]
|
26
|
Kettner C, Field D, Sansone SA, Taylor C, Aerts J, Binns N, Blake A, Britten CM, de Marco A, Fostel J, Gaudet P, González-Beltrán A, Hardy N, Hellemans J, Hermjakob H, Juty N, Leebens-Mack J, Maguire E, Neumann S, Orchard S, Parkinson H, Piel W, Ranganathan S, Rocca-Serra P, Santarsiero A, Shotton D, Sterk P, Untergasser A, Whetzel PL. Meeting Report from the Second "Minimum Information for Biological and Biomedical Investigations" (MIBBI) workshop. Stand Genomic Sci 2010; 3:259-66. [PMID: 21304730 PMCID: PMC3035314 DOI: 10.4056/sigs.147362] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
This report summarizes the proceedings of the second workshop of the 'Minimum Information for Biological and Biomedical Investigations' (MIBBI) consortium held on Dec 1-2, 2010 in Rüdesheim, Germany through the sponsorship of the Beilstein-Institute. MIBBI is an umbrella organization uniting communities developing Minimum Information (MI) checklists to standardize the description of data sets, the workflows by which they were generated and the scientific context for the work. This workshop brought together representatives of more than twenty communities to present the status of their MI checklists and plans for future development. Shared challenges and solutions were identified and the role of MIBBI in MI checklist development was discussed. The meeting featured some thirty presentations, wide-ranging discussions and breakout groups. The top outcomes of the two-day workshop as defined by the participants were: 1) the chance to share best practices and to identify areas of synergy; 2) defining a series of tasks for updating the MIBBI Portal; 3) reemphasizing the need to maintain independent MI checklists for various communities while leveraging common terms and workflow elements contained in multiple checklists; and 4) revision of the concept of the MIBBI Foundry to focus on the creation of a core set of MIBBI modules intended for reuse by individual MI checklist projects while maintaining the integrity of each MI project. Further information about MIBBI and its range of activities can be found at http://mibbi.org/.
Collapse
Affiliation(s)
| | - Dawn Field
- Centre for Ecology & Hydrology, Oxfordshire UK
| | | | - Chris Taylor
- The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Jan Aerts
- Faculty of Engineering - ESAT/SCD, Leuven University, Leuven-Heverlee, Belgium
| | - Nigel Binns
- Division of Pathway Medicine, University of Edinburgh Medical School, Edinburgh, UK
| | - Andrew Blake
- MRC Harwell, Harwell Science and Innovation Campus, Oxfordshire, UK
| | - Cedrik M. Britten
- Medical Department, University Medical Center, Johannes Gutenberg University-Mainz, Mainz, DE
| | - Ario de Marco
- Consortium for Genomic Technology, Milano, Italy
- University of Nova Gorica, Nova Gorica, Slovenia
| | | | | | - Alejandra González-Beltrán
- Computational and Systems Medicine and Department of Computer Science, University College London, London, UK
| | - Nigel Hardy
- Department of Computer Science, Aberystwyth University, Aberystwyth, UK
| | - Jan Hellemans
- Center for Medical Genetics, Ghent University, Ghent, Belgium
| | - Henning Hermjakob
- The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Nick Juty
- The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Jim Leebens-Mack
- Department of Plant Biology, University of Georgia, Athens, GA, U.S.A
| | - Eamonn Maguire
- University of Oxford, Oxford e-Research Centre, Oxfordshire, UK
| | - Steffen Neumann
- Department of Stress- and Developmental Biology, Institute for Plant Biochemistry, Halle, DE
| | - Sandra Orchard
- The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Helen Parkinson
- The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - William Piel
- Peabody Museum of Natural History, Yale University, New Haven, CT, U.S.A
| | - Shoba Ranganathan
- Macquarie University, Sydney NSW, Australia
- National University of Singapore, Singapore
| | | | - Annapaola Santarsiero
- The Mario Negri Institute for Pharmacological Research, Cancer Pharmacology, 20156 Milan, Italy
| | - David Shotton
- Image Bioinformatics Research Group, Department of Zoology, University of Oxford, Oxford, UK
| | - Peter Sterk
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - Andreas Untergasser
- Zentrum für Molekulare Biologie der Universität Heidelberg, Heidelberg, Germany
| | - Patricia L. Whetzel
- The National Center for Biomedical Ontology / Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, U.S.A
| |
Collapse
|
27
|
Advancing standards for bioinformatics activities: persistence, reproducibility, disambiguation and Minimum Information About a Bioinformatics investigation (MIABi). BMC Genomics 2010; 11 Suppl 4:S27. [PMID: 21143811 PMCID: PMC3005918 DOI: 10.1186/1471-2164-11-s4-s27] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The 2010 International Conference on Bioinformatics, InCoB2010, which is the annual conference of the Asia-Pacific Bioinformatics Network (APBioNet) has agreed to publish conference papers in compliance with the proposed Minimum Information about a Bioinformatics investigation (MIABi), proposed in June 2009. Authors of the conference supplements in BMC Bioinformatics, BMC Genomics and Immunome Research have consented to cooperate in this process, which will include the procedures described herein, where appropriate, to ensure data and software persistence and perpetuity, database and resource re-instantiability and reproducibility of results, author and contributor identity disambiguation and MIABi-compliance. Wherever possible, datasets and databases will be submitted to depositories with standardized terminologies. As standards are evolving, this process is intended as a prelude to the 100 BioDatabases (BioDB100) initiative whereby APBioNet collaborators will contribute exemplar databases to demonstrate the feasibility of standards-compliance and participate in refining the process for peer-review of such publications and validation of scientific claims and standards compliance. This testbed represents another step in advancing standards-based processes in the bioinformatics community which is essential to the growing interoperability of biological data, information, knowledge and computational resources.
Collapse
|
28
|
Holmes C, McDonald F, Jones M, Ozdemir V, Graham JE. Standardization and omics science: technical and social dimensions are inseparable and demand symmetrical study. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010; 14:327-32. [PMID: 20455752 DOI: 10.1089/omi.2010.0022] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Standardization is critical to scientists and regulators to ensure the quality and interoperability of research processes, as well as the safety and efficacy of the attendant research products. This is perhaps most evident in the case of "omics science," which is enabled by a host of diverse high-throughput technologies such as genomics, proteomics, and metabolomics. But standards are of interest to (and shaped by) others far beyond the immediate realm of individual scientists, laboratories, scientific consortia, or governments that develop, apply, and regulate them. Indeed, scientific standards have consequences for the social, ethical, and legal environment in which innovative technologies are regulated, and thereby command the attention of policy makers and citizens. This article argues that standardization of omics science is both technical and social. A critical synthesis of the social science literature indicates that: (1) standardization requires a degree of flexibility to be practical at the level of scientific practice in disparate sites; (2) the manner in which standards are created, and by whom, will impact their perceived legitimacy and therefore their potential to be used; and (3) the process of standardization itself is important to establishing the legitimacy of an area of scientific research.
Collapse
Affiliation(s)
- Christina Holmes
- Technoscience and Regulation Research Unit, Faculty of Medicine, Dalhousie University, Halifax, Canada
| | | | | | | | | |
Collapse
|
29
|
Langille MGI, Eisen JA. BioTorrents: a file sharing service for scientific data. PLoS One 2010; 5:e10071. [PMID: 20418944 PMCID: PMC2854681 DOI: 10.1371/journal.pone.0010071] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2010] [Accepted: 03/17/2010] [Indexed: 11/19/2022] Open
Abstract
The transfer of scientific data has emerged as a significant challenge, as datasets continue to grow in size and demand for open access sharing increases. Current methods for file transfer do not scale well for large files and can cause long transfer times. In this study we present BioTorrents, a website that allows open access sharing of scientific data and uses the popular BitTorrent peer-to-peer file sharing technology. BioTorrents allows files to be transferred rapidly due to the sharing of bandwidth across multiple institutions and provides more reliable file transfers due to the built-in error checking of the file sharing technology. BioTorrents contains multiple features, including keyword searching, category browsing, RSS feeds, torrent comments, and a discussion forum. BioTorrents is available at http://www.biotorrents.net.
Collapse
Affiliation(s)
- Morgan G I Langille
- Genome Center, University of California Davis, Davis, California, United States of America.
| | | |
Collapse
|
30
|
Cochrane GR, Galperin MY. The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources. Nucleic Acids Res 2009; 38:D1-4. [PMID: 19965766 PMCID: PMC2808992 DOI: 10.1093/nar/gkp1077] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The current issue of Nucleic Acids Research includes descriptions of 58 new and 73 updated data resources. The accompanying online Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, now lists 1230 carefully selected databases covering various aspects of molecular and cell biology. While most data resource descriptions remain very brief, the issue includes several longer papers that highlight recent significant developments in such databases as Pfam, MetaCyc, UniProt, ELM and PDBe. The databases described in the Database Issue and Database Collection, however, are far more than a distinct set of resources; they form a network of connected data, concepts and shared technology. The full content of the Database Issue is available online at the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).
Collapse
Affiliation(s)
- Guy R Cochrane
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | |
Collapse
|
31
|
Han MV, Zmasek CM. phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics 2009; 10:356. [PMID: 19860910 PMCID: PMC2774328 DOI: 10.1186/1471-2105-10-356] [Citation(s) in RCA: 386] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2009] [Accepted: 10/27/2009] [Indexed: 11/26/2022] Open
Abstract
Background Evolutionary trees are central to a wide range of biological studies. In many of these studies, tree nodes and branches need to be associated (or annotated) with various attributes. For example, in studies concerned with organismal relationships, tree nodes are associated with taxonomic names, whereas tree branches have lengths and oftentimes support values. Gene trees used in comparative genomics or phylogenomics are usually annotated with taxonomic information, genome-related data, such as gene names and functional annotations, as well as events such as gene duplications, speciations, or exon shufflings, combined with information related to the evolutionary tree itself. The data standards currently used for evolutionary trees have limited capacities to incorporate such annotations of different data types. Results We developed a XML language, named phyloXML, for describing evolutionary trees, as well as various associated data items. PhyloXML provides elements for commonly used items, such as branch lengths, support values, taxonomic names, and gene names and identifiers. By using "property" elements, phyloXML can be adapted to novel and unforeseen use cases. We also developed various software tools for reading, writing, conversion, and visualization of phyloXML formatted data. Conclusion PhyloXML is an XML language defined by a complete schema in XSD that allows storing and exchanging the structures of evolutionary trees as well as associated data. More information about phyloXML itself, the XSD schema, as well as tools implementing and supporting phyloXML, is available at .
Collapse
Affiliation(s)
- Mira V Han
- School of Informatics, Indiana University, Bloomington, IN 47408, USA.
| | | |
Collapse
|
32
|
Abstract
Meta-analysis has contributed substantially to shifting paradigms in ecology and has become the primary method for quantitatively synthesizing published research. However, an emerging challenge is the lack of a statistical protocol to synthesize studies and evaluate sources of bias while simultaneously accounting for phylogenetic nonindependence of taxa. Phylogenetic nonindependence arises from homology, the similarity of taxa due to shared ancestry, and treating related taxa as independent data violates assumptions of statistics. Given that an explicit goal of meta-analysis is to generalize research across a broad range of taxa, then phylogenetic nonindependence may threaten conclusions drawn from such reviews. Here I outline a statistical framework that integrates phylogenetic information into conventional meta-analysis when (a) taking a weighted average of effect sizes using fixed- and random-effects models and (b) testing for homogeneity of variances. I also outline how to test evolutionary hypotheses with meta-analysis by describing a method that evaluates phylogenetic conservatism and a model-selection framework that competes neutral and adaptive hypotheses to explain variation in meta-analytical data. Finally, I address several theoretical and practical issues relating to the application and availability of phylogenetic information for meta-analysis.
Collapse
Affiliation(s)
- Marc J Lajeunesse
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York 14853, USA.
| |
Collapse
|
33
|
Johne R, Müller H, Rector A, van Ranst M, Stevens H. Rolling-circle amplification of viral DNA genomes using phi29 polymerase. Trends Microbiol 2009; 17:205-11. [DOI: 10.1016/j.tim.2009.02.004] [Citation(s) in RCA: 143] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2008] [Revised: 01/16/2009] [Accepted: 02/25/2009] [Indexed: 12/01/2022]
|
34
|
|
35
|
Prosdocimi F, Chisham B, Pontelli E, Stoltzfus A, Thompson JD. Knowledge Standardization in Evolutionary Biology: The Comparative Data Analysis Ontology. Evol Biol 2009. [DOI: 10.1007/978-3-642-00952-5_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
36
|
Cardona G, Rosselló F, Valiente G. Extended Newick: it is time for a standard representation of phylogenetic networks. BMC Bioinformatics 2008; 9:532. [PMID: 19077301 PMCID: PMC2621367 DOI: 10.1186/1471-2105-9-532] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2008] [Accepted: 12/15/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Phylogenetic trees resulting from molecular phylogenetic analysis are available in Newick format from specialized databases but when it comes to phylogenetic networks, which provide an explicit representation of reticulate evolutionary events such as recombination, hybridization or lateral gene transfer, the lack of a standard format for their representation has hindered the publication of explicit phylogenetic networks in the specialized literature and their incorporation in specialized databases. Two different proposals to represent phylogenetic networks exist: as a single Newick string (where each hybrid node is splitted once for each parent) or as a set of Newick strings (one for each hybrid node plus another one for the phylogenetic network). RESULTS The standard we advocate as extended Newick format describes a whole phylogenetic network with k hybrid nodes as a single Newick string with k repeated nodes, and this representation is unique once the phylogenetic network is drawn or the ordering among children nodes is fixed. The extended Newick format facilitates phylogenetic data sharing and exchange, and also allows for the practical use of phylogenetic networks in computer programs and scripts. This standard has been recently agreed upon by a number of computational biologists, is already supported by several phylogenetic tools, and avoids the different drawbacks of using an a priori unknown number of Newick strings without any additional mark-up to represent a phylogenetic network. CONCLUSION The adoption of the extended Newick format as a standard for the representation of phylogenetic network is an important step towards the publication of explicit phylogenetic networks in peer-reviewed journals and their incorporation in a future database of published phylogenetic networks.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, Spain.
| | | | | |
Collapse
|
37
|
Abstract
Problematica are taxa that defy robust phylogenetic placement. Traditionally the term was restricted to fossil forms, but it is clear that extant taxa may be just as difficult to place, whether using morphological or molecular (nucleotide, gene or genomic) markers for phylogeny reconstruction. We discuss the kinds and causes of Problematica within the Metazoa, as well as criteria for their recognition and possible solutions. The inclusive set of Problematica changes depending upon the nature and quality of (homologous) data available, the methods of phylogeny reconstruction and the sister taxa inferred by their placement or displacement. We address Problematica in the context of pre-cladistic phylogenetics, numerical morphological cladistics and molecular phylogenetics, and focus on general biological and methodological implications of Problematica, rather than presenting a review of individual taxa. Rather than excluding Problematica from phylogeny reconstruction, as has often been preferred, we conclude that the study of Problematica is crucial for both the resolution of metazoan phylogeny and the proper inference of body plan evolution.
Collapse
Affiliation(s)
- Ronald A Jenner
- Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK.
| | | |
Collapse
|
38
|
Huerta-Cepas J, Bueno A, Dopazo J, Gabaldón T. PhylomeDB: a database for genome-wide collections of gene phylogenies. Nucleic Acids Res 2007; 36:D491-6. [PMID: 17962297 PMCID: PMC2238872 DOI: 10.1093/nar/gkm899] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The complete collection of evolutionary histories of all genes in a genome, also known as phylome, constitutes a valuable source of information. The reconstruction of phylomes has been previously prevented by large demands of time and computer power, but is now feasible thanks to recent developments in computers and algorithms. To provide a publicly available repository of complete phylomes that allows researchers to access and store large-scale phylogenomic analyses, we have developed PhylomeDB. PhylomeDB is a database of complete phylomes derived for different genomes within a specific taxonomic range. All phylomes in the database are built using a high-quality phylogenetic pipeline that includes evolutionary model testing and alignment trimming phases. For each genome, PhylomeDB provides the alignments, phylogentic trees and tree-based orthology predictions for every single encoded protein. The current version of PhylomeDB includes the phylomes of Human, the yeast Saccharomyces cerevisiae and the bacterium Escherichia coli, comprising a total of 32 289 seed sequences with their corresponding alignments and 172 324 phylogenetic trees. PhylomeDB can be publicly accessed at http://phylomedb.bioinfo.cipf.es
Collapse
Affiliation(s)
- Jaime Huerta-Cepas
- Bioinformatics Department, Centro de Investigación Príncipe Felipe, Avda. Autopista del Saler, 13 Valencia 46013, Spain
| | | | | | | |
Collapse
|
39
|
Abstract
The GeneTrees phylogenomics system pursues comparative genomic analyses from the perspective of gene phylogenies for individual genes. The GeneTrees project has the goal of providing detailed evolutionary models for all protein-coding gene components of the fully sequenced genomes. Currently, a database of alignments and trees for all protein sequences for 325 fully sequenced and annotated prokaryote genomes is available. The prokaryote database contains 890 000 protein sequences organized into over 100 000 alignments, each described by a phylogenetic tree. An original homology group discovery tool assembles sets of related proteins from all versus all pairwise alignments. Multiple alignments for each homology group are stored and subjected to phylogenetic tree inference. A graphical web interface provides visual exploration of the GeneTrees database. Homology groups can be queried by sequence identifiers or annotation terms. Genomes can be browsed visually on a gene map of each chromosome or plasmid. Phylogenetic trees with support values are displayed in conjunction with the associated sequence alignment. A variety of classes of information can be selected to label the tree tips to aid in visual evaluation of annotation and gene function. This web interface is available at .
Collapse
Affiliation(s)
| | - Allan W. Dickerman
- To whom correspondence should be addressed. Tel: +1 540 231 1397; Fax: +1 540 231 2606;
| |
Collapse
|
40
|
Stephenson EL, Braude PR, Mason C. Proposal for a universal minimum information convention for the reporting on the derivation of human embryonic stem cell lines. Regen Med 2006; 1:739-50. [PMID: 17465755 DOI: 10.2217/17460751.1.6.739] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|