1
|
Alpeeva EV, Sharova NP, Sharov KS, Vorotelyak EA. Russian Biodiversity Collections: A Professional Opinion Survey. Animals (Basel) 2023; 13:3777. [PMID: 38136814 PMCID: PMC10740833 DOI: 10.3390/ani13243777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/15/2023] [Accepted: 11/22/2023] [Indexed: 12/24/2023] Open
Abstract
Biodiversity collections are important vehicles for protecting endangered wildlife in situations of adverse anthropogenic influence. In Russia, there are currently a number of institution- and museum-based biological collections, but there are no nation-wide centres of biodiversity collections. In this paper, we report on the results of our survey of 324 bioconservation, big-data, and ecology specialists from different regions of Russia in regard to the necessity to create several large national biodiversity centres of wildlife protection. The survey revealed specific goals that have to be fulfilled during the development of these centres for the protection and restoration of endangered wildlife species. The top three problems/tasks (topics) are the following: (1) the necessity to create large national centres for different types of specimens; (2) the full sequencing and creation of different "omic" (genomic, proteomic, transcriptomic, etc.) databases; (3) full digitisation of a biodiversity collection/centre. These goals may constitute a guideline for the future of biodiversity collections in Russia that would be targeted at protecting and restoring endangered species. With the due network service level, the translation of the website into English, and permission from the regulator (Ministry of Science and Higher Education of Russian Federation), it can also become an international project.
Collapse
Affiliation(s)
| | | | - Konstantin S. Sharov
- Koltzov Institute of Developmental Biology of Russian Academy of Sciences, 26 Vavilov Street, 119334 Moscow, Russia; (E.V.A.); (N.P.S.)
| | | |
Collapse
|
2
|
Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. FRONTIERS IN PLANT SCIENCE 2023; 14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]
Abstract
The importance of improving the FAIRness (findability, accessibility, interoperability, reusability) of research data is undeniable, especially in the face of large, complex datasets currently being produced by omics technologies. Facilitating the integration of a dataset with other types of data increases the likelihood of reuse, and the potential of answering novel research questions. Ontologies are a useful tool for semantically tagging datasets as adding relevant metadata increases the understanding of how data was produced and increases its interoperability. Ontologies provide concepts for a particular domain as well as the relationships between concepts. By tagging data with ontology terms, data becomes both human- and machine- interpretable, allowing for increased reuse and interoperability. However, the task of identifying ontologies relevant to a particular research domain or technology is challenging, especially within the diverse realm of fundamental plant research. In this review, we outline the ontologies most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments within metadata frameworks, such as Investigation-Study-Assay (ISA). We also outline repositories and platforms most useful for identifying applicable ontologies or finding ontology terms.
Collapse
Affiliation(s)
- Kathryn Dumschott
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Hannah Dörpholz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Dominik Brilhaus
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrea Schrader
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science & Cluster of Excellence on Plant Sciences (CEPLAS), Faculty of Mathematics and Life Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Steffen Neumann
- Program Center MetaCom, Leibniz Institute of Plant Biochemistry, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Angela Kranz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
3
|
Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023; 34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
| | - Meghan A Balk
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Robyn L Ball
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | | | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Laura W Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Nicole Vasilevsky
- Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
| | | | | |
Collapse
|
4
|
Stefancsik R, Balhoff JP, Balk MA, Ball R, Bello SM, Caron AR, Chessler E, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525742. [PMID: 36747660 PMCID: PMC9900877 DOI: 10.1101/2023.01.26.525742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Meghan A. Balk
- National Ecological Observatory Network, Battelle, Boulder, CO 80301, USA
| | - Robyn Ball
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | - Anita R. Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Laura W. Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A. McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Nicole Vasilevsky
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | | |
Collapse
|
5
|
Blumberg K, Miller M, Ponsero A, Hurwitz B. Ontology-driven analysis of marine metagenomics: what more can we learn from our data? Gigascience 2022; 12:giad088. [PMID: 37941395 PMCID: PMC10632069 DOI: 10.1093/gigascience/giad088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 06/30/2023] [Accepted: 09/28/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND The proliferation of metagenomic sequencing technologies has enabled novel insights into the functional genomic potentials and taxonomic structure of microbial communities. However, cyberinfrastructure efforts to manage and enable the reproducible analysis of sequence data have not kept pace. Thus, there is increasing recognition of the need to make metagenomic data discoverable within machine-searchable frameworks compliant with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for data stewardship. Although a variety of metagenomic web services exist, none currently leverage the hierarchically structured terminology encoded within common life science ontologies to programmatically discover data. RESULTS Here, we integrate large-scale marine metagenomic datasets with community-driven life science ontologies into a novel FAIR web service. This approach enables the retrieval of data discovered by intersecting the knowledge represented within ontologies against the functional genomic potential and taxonomic structure computed from marine sequencing data. Our findings highlight various microbial functional and taxonomic patterns relevant to the ecology of prokaryotes in various aquatic environments. CONCLUSIONS In this work, we present and evaluate a novel Semantic Web architecture that can be used to ask novel biological questions of existing marine metagenomic datasets. Finally, the FAIR ontology searchable data products provided by our API can be leveraged by future research efforts.
Collapse
Affiliation(s)
- Kai Blumberg
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ 85721, USA
- BIO5 Institute, University of Arizona, Tucson, AZ 85721, USA
| | - Matthew Miller
- BIO5 Institute, University of Arizona, Tucson, AZ 85721, USA
| | - Alise Ponsero
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ 85721, USA
- BIO5 Institute, University of Arizona, Tucson, AZ 85721, USA
- Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, Helsinki 00290, Finland
| | - Bonnie Hurwitz
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ 85721, USA
- BIO5 Institute, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
6
|
Agosti D, Benichou L, Addink W, Arvanitidis C, Catapano T, Cochrane G, Dillen M, Döring M, Georgiev T, Gérard I, Groom Q, Kishor P, Kroh A, Kvaček J, Mergen P, Mietchen D, Pauperio J, Sautter G, Penev L. Recommendations for use of annotations and persistent identifiers in taxonomy and biodiversity publishing. RESEARCH IDEAS AND OUTCOMES 2022. [DOI: 10.3897/rio.8.e97374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The paper summarises many years of discussions and experience of biodiversity publishers, organisations, research projects and individual researchers, and proposes recommendations for implementation of persistent identifiers for article metadata, structural elements (sections, subsections, figures, tables, references, supplementary materials and others) and data specific to biodiversity (taxonomic treatments, treatment citations, taxon names, material citations, gene sequences, specimens, scientific collections) in taxonomy and biodiversity publishing. The paper proposes best practices on how identifiers should be used in the different cases and on how they can be minted, cited, and expressed in the backend article XML to facilitate conversion to and further re-use of the article content as FAIR data. The paper also discusses several specific routes for post-publication re-use of semantically enhanced content through large biodiversity data aggregators such as the Global Biodiversity Information Facility (GBIF), the International Nucleotide Sequence Database Collaboration (INSDC) and others, and proposes specifications of both identifiers and XML tags to be used for that purpose. A summary table provides an account and overview of the recommendations. The guidelines are supported with examples from the existing publishing practices.
Collapse
|
7
|
Abstract
Despite an ever-growing number of data sets that catalog and characterize interactions between microbes in different environments and conditions, many of these data are neither easily accessible nor intercompatible. These limitations present a major challenge to microbiome research by hindering the streamlined drawing of inferences across studies. Here, we propose guiding principles to make microbial interaction data more findable, accessible, interoperable, and reusable (FAIR). We outline specific use cases for interaction data that span the diverse space of microbiome research, and discuss the untapped potential for new insights that can be fulfilled through broader integration of microbial interaction data. These include, among others, the design of intercompatible synthetic communities for environmental, industrial, or medical applications, and the inference of novel interactions from disparate studies. Lastly, we envision potential trajectories for the deployment of FAIR microbial interaction data based on existing resources, reporting standards, and current momentum within the community.
Collapse
Affiliation(s)
| | - Charlie Pauvert
- Functional Microbiome Research Group, Institute of Medical Microbiology, University Hospital of RWTH, Aachen, Germany
| | - Dileep Kishore
- Bioinformatics Program and Biological Design Center, Boston University, Boston, Massachusetts, USA
| | - Daniel Segrè
- Bioinformatics Program and Biological Design Center, Boston University, Boston, Massachusetts, USA
- Department of Biology, Department of Biomedical Engineering, Department of Physics, Boston University, Boston Massachusetts, USA
| |
Collapse
|
8
|
Patel A, Jain S, Debnath NC, Lama V. InBiodiv-O. INTERNATIONAL JOURNAL OF INFORMATION SYSTEM MODELING AND DESIGN 2022. [DOI: 10.4018/ijismd.315021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
To present the biodiversity information, a semantic model is required that connects all kinds of data about living creatures and their habitats. The model must be able to encode human knowledge for machines to be understood. Ontology offers the richest machine-interpretable semantics that are being extensively used in the biodiversity domain. Various ontologies are developed for the biodiversity domain; however, these ontologies are not capable to define the Indian biodiversity information though India is one of the megadiverse countries. To semantically analyze the Indian biodiversity information, it is crucial to build an ontology that describes all the terms of this domain. Since the curation of the ontology depends on the domain where these are used, there is no ideal methodology defined yet. The aim of this article is to develop an ontology that semantically encodes all the terms of Indian biodiversity information in all its dimensions based on the proposed methodology. The evaluation of the proposed ontology depicts that ontology is well built in the specified domain.
Collapse
Affiliation(s)
| | - Sarika Jain
- National Institute of Technology, Kurukshetra, India
| | | | | |
Collapse
|
9
|
Inglis LK, Edwards RA. How Metagenomics Has Transformed Our Understanding of Bacteriophages in Microbiome Research. Microorganisms 2022; 10:microorganisms10081671. [PMID: 36014086 PMCID: PMC9415785 DOI: 10.3390/microorganisms10081671] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 08/15/2022] [Accepted: 08/16/2022] [Indexed: 11/16/2022] Open
Abstract
The microbiome is an essential part of most ecosystems. It was originally studied mostly through culturing but relatively few microbes can be cultured, so much of the microbiome was left unexplored. The emergence of metagenomic sequencing techniques changed that and allowed the study of microbiomes from all sorts of habitats. Metagenomic sequencing also allowed for a more thorough exploration of prophages, viruses that integrate into bacterial genomes, and how they benefit their hosts. One issue with using open-access metagenomic data is that sequences added to databases often have little to no metadata to work with, so finding enough sequences can be difficult. Many metagenomes have been manually curated but this is a time-consuming process and relies heavily on the uploader to be accurate and thorough when filling in metadata fields and the curators to be working with the same ontologies. Using algorithms to automatically sort metagenomes based on either the taxonomic profile or the functional profile may be a viable solution to the issues with manually curated metagenomes, but it requires that the algorithm is trained on carefully curated datasets and using the most informative profile possible in order to minimize errors.
Collapse
|
10
|
Farrell MJ, Brierley L, Willoughby A, Yates A, Mideo N. Past and future uses of text mining in ecology and evolution. Proc Biol Sci 2022; 289:20212721. [PMID: 35582795 PMCID: PMC9114983 DOI: 10.1098/rspb.2021.2721] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Ecology and evolutionary biology, like other scientific fields, are experiencing an exponential growth of academic manuscripts. As domain knowledge accumulates, scientists will need new computational approaches for identifying relevant literature to read and include in formal literature reviews and meta-analyses. Importantly, these approaches can also facilitate automated, large-scale data synthesis tasks and build structured databases from the information in the texts of primary journal articles, books, grey literature, and websites. The increasing availability of digital text, computational resources, and machine-learning based language models have led to a revolution in text analysis and natural language processing (NLP) in recent years. NLP has been widely adopted across the biomedical sciences but is rarely used in ecology and evolutionary biology. Applying computational tools from text mining and NLP will increase the efficiency of data synthesis, improve the reproducibility of literature reviews, formalize analyses of research biases and knowledge gaps, and promote data-driven discovery of patterns across ecology and evolutionary biology. Here we present recent use cases from ecology and evolution, and discuss future applications, limitations and ethical issues.
Collapse
Affiliation(s)
- Maxwell J. Farrell
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
| | - Liam Brierley
- Department of Health Data Science, University of Liverpool, Liverpool, UK
| | - Anna Willoughby
- Odum School of Ecology, University of Georgia, Athens, GA, USA,Center for the Ecology of Infectious Diseases, University of Georgia, Athens, GA, USA
| | - Andrew Yates
- University of Amsterdam, Amsterdam, The Netherlands
| | - Nicole Mideo
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
| |
Collapse
|
11
|
Martens M, Evelo CT, Willighagen EL. Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. APPLIED IN VITRO TOXICOLOGY 2022; 8:2-13. [PMID: 35388368 DOI: 10.26434/chemrxiv.13524191] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
INTRODUCTION The AOP-Wiki is the main platform for the development and storage of adverse outcome pathways (AOPs). These AOPs describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content, and to link it with chemical and biological resources. Together, this allows more detailed exploration, which can be automated. MATERIALS AND METHODS We converted the complete AOP-Wiki content into resource description framework (RDF) triples. We used >20 ontologies for the semantic annotation of property-object relations, including the Chemical Information Ontology, Dublin Core, and the AOP Ontology. RESULTS The resulting RDF contains >122,000 triples describing 158 unique properties of >15,000 unique subjects. Furthermore, >3500 link-outs were added to 12 chemical databases, and >7500 link-outs to 4 gene and protein databases. The AOP-Wiki RDF has been made available at https://aopwiki.rdf.bigcat-bioinformatics.org. DISCUSSION SPARQL queries can be used to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. CONCLUSION Overall, the AOP-Wiki RDF allows new ways to explore the rapidly growing AOP knowledge and makes the integration of this database in automated workflows possible, making the AOP-Wiki more FAIR.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
12
|
Martens M, Evelo CT, Willighagen EL. Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. APPLIED IN VITRO TOXICOLOGY 2022; 8:2-13. [PMID: 35388368 PMCID: PMC8978481 DOI: 10.1089/aivt.2021.0010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Introduction: The AOP-Wiki is the main platform for the development and storage of adverse outcome pathways (AOPs). These AOPs describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content, and to link it with chemical and biological resources. Together, this allows more detailed exploration, which can be automated. Materials and Methods: We converted the complete AOP-Wiki content into resource description framework (RDF) triples. We used >20 ontologies for the semantic annotation of property–object relations, including the Chemical Information Ontology, Dublin Core, and the AOP Ontology. Results: The resulting RDF contains >122,000 triples describing 158 unique properties of >15,000 unique subjects. Furthermore, >3500 link-outs were added to 12 chemical databases, and >7500 link-outs to 4 gene and protein databases. The AOP-Wiki RDF has been made available at https://aopwiki.rdf.bigcat-bioinformatics.org Discussion: SPARQL queries can be used to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. Conclusion: Overall, the AOP-Wiki RDF allows new ways to explore the rapidly growing AOP knowledge and makes the integration of this database in automated workflows possible, making the AOP-Wiki more FAIR.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Egon L. Willighagen
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
13
|
Thomer AK. Integrative data reuse at scientifically significant sites: Case studies at Yellowstone National Park and the La Brea Tar Pits. J Assoc Inf Sci Technol 2022. [DOI: 10.1002/asi.24620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Andrea K. Thomer
- School of Information University of Michigan Ann Arbor Michigan USA
| |
Collapse
|
14
|
Zafeiropoulos H, Paragkamian S, Ninidakis S, Pavlopoulos GA, Jensen LJ, Pafilis E. PREGO: A Literature and Data-Mining Resource to Associate Microorganisms, Biological Processes, and Environment Types. Microorganisms 2022; 10:microorganisms10020293. [PMID: 35208748 PMCID: PMC8879827 DOI: 10.3390/microorganisms10020293] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 01/19/2022] [Accepted: 01/20/2022] [Indexed: 12/12/2022] Open
Abstract
To elucidate ecosystem functioning, it is fundamental to recognize what processes occur in which environments (where) and which microorganisms carry them out (who). Here, we present PREGO, a one-stop-shop knowledge base providing such associations. PREGO combines text mining and data integration techniques to mine such what-where-who associations from data and metadata scattered in the scientific literature and in public omics repositories. Microorganisms, biological processes, and environment types are identified and mapped to ontology terms from established community resources. Analyses of comentions in text and co-occurrences in metagenomics data/metadata are performed to extract associations and a level of confidence is assigned to each of them thanks to a scoring scheme. The PREGO knowledge base contains associations for 364,508 microbial taxa, 1090 environmental types, 15,091 biological processes, and 7971 molecular functions with a total of almost 58 million associations. These associations are available through a web portal, an Application Programming Interface (API), and bulk download. By exploring environments and/or processes associated with each other or with microbes, PREGO aims to assist researchers in design and interpretation of experiments and their results. To demonstrate PREGO’s capabilities, a thorough presentation of its web interface is given along with a meta-analysis of experimental results from a lagoon-sediment study of sulfur-cycle related microbes.
Collapse
Affiliation(s)
- Haris Zafeiropoulos
- Department of Biology, University of Crete, Voutes University Campus, P.O. Box 2208, 70013 Heraklion, Crete, Greece; (H.Z.); (S.P.)
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, P.O. Box 2214, 71003 Heraklion, Crete, Greece;
| | - Savvas Paragkamian
- Department of Biology, University of Crete, Voutes University Campus, P.O. Box 2208, 70013 Heraklion, Crete, Greece; (H.Z.); (S.P.)
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, P.O. Box 2214, 71003 Heraklion, Crete, Greece;
| | - Stelios Ninidakis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, P.O. Box 2214, 71003 Heraklion, Crete, Greece;
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece;
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark;
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, P.O. Box 2214, 71003 Heraklion, Crete, Greece;
- Correspondence: or ; Tel.: +30-2810-337748
| |
Collapse
|
15
|
Blumberg KL, Ponsero AJ, Bomhoff M, Wood-Charlson EM, DeLong EF, Hurwitz BL. Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems. Front Microbiol 2021; 12:765268. [PMID: 34956127 PMCID: PMC8692764 DOI: 10.3389/fmicb.2021.765268] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 11/16/2021] [Indexed: 11/13/2022] Open
Abstract
Marine microbial ecology requires the systematic comparison of biogeochemical and sequence data to analyze environmental influences on the distribution and variability of microbial communities. With ever-increasing quantities of metagenomic data, there is a growing need to make datasets Findable, Accessible, Interoperable, and Reusable (FAIR) across diverse ecosystems. FAIR data is essential to developing analytical frameworks that integrate microbiological, genomic, ecological, oceanographic, and computational methods. Although community standards defining the minimal metadata required to accompany sequence data exist, they haven’t been consistently used across projects, precluding interoperability. Moreover, these data are not machine-actionable or discoverable by cyberinfrastructure systems. By making ‘omic and physicochemical datasets FAIR to machine systems, we can enable sequence data discovery and reuse based on machine-readable descriptions of environments or physicochemical gradients. In this work, we developed a novel technical specification for dataset encapsulation for the FAIR reuse of marine metagenomic and physicochemical datasets within cyberinfrastructure systems. This includes using Frictionless Data Packages enriched with terminology from environmental and life-science ontologies to annotate measured variables, their units, and the measurement devices used. This approach was implemented in Planet Microbe, a cyberinfrastructure platform and marine metagenomic web-portal. Here, we discuss the data properties built into the specification to make global ocean datasets FAIR within the Planet Microbe portal. We additionally discuss the selection of, and contributions to marine-science ontologies used within the specification. Finally, we use the system to discover data by which to answer various biological questions about environments, physicochemical gradients, and microbial communities in meta-analyses. This work represents a future direction in marine metagenomic research by proposing a specification for FAIR dataset encapsulation that, if adopted within cyberinfrastructure systems, would automate the discovery, exchange, and re-use of data needed to answer broader reaching questions than originally intended.
Collapse
Affiliation(s)
- Kai L Blumberg
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States
| | - Alise J Ponsero
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States
| | - Matthew Bomhoff
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States
| | - Elisha M Wood-Charlson
- E.O. Lawrence Berkeley National Laboratory, Environmental Genomics and Systems Biology Division, Berkeley, CA, United States
| | - Edward F DeLong
- Daniel K. Inouye Center for Microbial Oceanography, University of Hawai'i, Honolulu, HI, United States
| | - Bonnie L Hurwitz
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States.,BIO5 Institute, University of Arizona, Tucson, AZ, United States
| |
Collapse
|
16
|
Springer N, Musengezi J, Hunter EO, Kaiser C, Shyamsundar P. Using economics in conservation practice: Insights from a global environmental organization. CONSERVATION SCIENCE AND PRACTICE 2021. [DOI: 10.1111/csp2.377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Affiliation(s)
- Nathaniel Springer
- Institute on the Environment University of Minnesota St Paul Minnesota USA
| | | | | | | | | |
Collapse
|
17
|
Yu J, Young RG, Deeth LE, Hanner RH. Molecular Detection Mapping and Analysis Platform for R (MDMAPR) facilitating the standardization, analysis, visualization, and sharing of qPCR data and metadata. PeerJ 2020; 8:e9974. [PMID: 33150057 PMCID: PMC7587055 DOI: 10.7717/peerj.9974] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 08/26/2020] [Indexed: 11/30/2022] Open
Abstract
Quantitative polymerase chain reaction (qPCR) has been used as a standard molecular detection tool in many scientific fields. Unfortunately, there is no standard method for managing published qPCR data, and those currently used generally focus on only managing raw fluorescence data. However, associated with qPCR experiments are extensive sample and assay metadata, often under-examined and under-reported. Here, we present the Molecular Detection Mapping and Analysis Platform for R (MDMAPR), an open-source and fully scalable informatics tool for researchers to merge raw qPCR fluorescence data with associated metadata into a standard format, while geospatially visualizing the distribution of the data and relative intensity of the qPCR results. The advance of this approach is in the ability to use MDMAPR to store varied qPCR data. This includes pathogen and environmental qPCR species detection studies ideally suited to geographical visualization. However, it also goes beyond these and can be utilized with other qPCR data including gene expression studies, quantification studies used in identifying health dangers associated with food and water bacteria, and the identification of unknown samples. In addition, MDMAPR’s novel centralized management and geospatial visualization of qPCR data can further enable cross-discipline large-scale qPCR data standardization and accessibility to support research spanning multiple fields of science and qPCR applications.
Collapse
Affiliation(s)
- Jiaojia Yu
- Integrative Biology, University of Guelph, Guelph, Ontario, Canada
| | - Robert G Young
- Integrative Biology, University of Guelph, Guelph, Ontario, Canada
| | - Lorna E Deeth
- Department of Mathematics and Statistics, University of Guelph, Guelph, Ontario, Canada
| | - Robert H Hanner
- Integrative Biology, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
18
|
Thessen AE, Walls RL, Vogt L, Singer J, Warren R, Buttigieg PL, Balhoff JP, Mungall CJ, McGuinness DL, Stucky BJ, Yoder MJ, Haendel MA. Transforming the study of organisms: Phenomic data models and knowledge bases. PLoS Comput Biol 2020; 16:e1008376. [PMID: 33232313 PMCID: PMC7685442 DOI: 10.1371/journal.pcbi.1008376] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The rapidly decreasing cost of gene sequencing has resulted in a deluge of genomic data from across the tree of life; however, outside a few model organism databases, genomic data are limited in their scientific impact because they are not accompanied by computable phenomic data. The majority of phenomic data are contained in countless small, heterogeneous phenotypic data sets that are very difficult or impossible to integrate at scale because of variable formats, lack of digitization, and linguistic problems. One powerful solution is to represent phenotypic data using data models with precise, computable semantics, but adoption of semantic standards for representing phenotypic data has been slow, especially in biodiversity and ecology. Some phenotypic and trait data are available in a semantic language from knowledge bases, but these are often not interoperable. In this review, we will compare and contrast existing ontology and data models, focusing on nonhuman phenotypes and traits. We discuss barriers to integration of phenotypic data and make recommendations for developing an operationally useful, semantically interoperable phenotypic data ecosystem.
Collapse
Affiliation(s)
- Anne E. Thessen
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
- Ronin Institute for Independent Scholarship, Monclair, New Jersey, United States of America
| | - Ramona L. Walls
- Bio5 Institute, University of Arizona, Tucson, Arizona, United States of America
| | - Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
| | | | | | - Pier Luigi Buttigieg
- Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Bremerhaven, Germany
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | | | - Brian J. Stucky
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| | - Matthew J. Yoder
- Illinois Natural History Survey, Champaign, Illinois, United States of America
| | - Melissa A. Haendel
- Environmental and Molecular Toxicology, Oregon State University, Corvallis, Oregon, United States of America
| |
Collapse
|
19
|
Samuel S, Shadaydeh M, Böcker S, Brügmann B, Bucher SF, Deckert V, Denzler J, Dittrich P, von Eggeling F, Güllmar D, Guntinas-Lichius O, König-Ries B, Löffler F, Maicher L, Marz M, Migliavacca M, R. Reichenbach J, Reichstein M, Römermann C, Wittig A. A virtual “Werkstatt” for digitization in the sciences. RESEARCH IDEAS AND OUTCOMES 2020. [DOI: 10.3897/rio.6.e54106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Data is central in almost all scientific disciplines nowadays. Furthermore, intelligent systems have developed rapidly in recent years, so that in many disciplines the expectation is emerging that with the help of intelligent systems, significant challenges can be overcome and science can be done in completely new ways. In order for this to succeed, however, first, fundamental research in computer science is still required, and, second, generic tools must be developed on which specialized solutions can be built. In this paper, we introduce a recently started collaborative project funded by the Carl Zeiss Foundation, a virtual manufactory for digitization in the sciences, the “Werkstatt”, which is being established at the Michael Stifel Center Jena (MSCJ) for data-driven and simulation science to address fundamental questions in computer science and applications. The Werkstatt focuses on three key areas, which include generic tools for machine learning, knowledge generation using machine learning processes, and semantic methods for the data life cycle, as well as the application of these topics in different disciplines. Core and pilot projects address the key aspects of the topics and form the basis for sustainable work in the Werkstatt.
Collapse
|
20
|
Blair J, Gwiazdowski R, Borrelli A, Hotchkiss M, Park C, Perrett G, Hanner R. Towards a catalogue of biodiversity databases: An ontological case study. Biodivers Data J 2020; 8:e32765. [PMID: 32269475 PMCID: PMC7125240 DOI: 10.3897/bdj.8.e32765] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Accepted: 06/23/2019] [Indexed: 11/14/2022] Open
Abstract
Biodiversity informatics depends on digital access to credible information about species. Many online resources host species’ data, but the lack of categorisation for these resources inhibits the growth of this entire field. To explore possible solutions, we examined the (now retired) Biodiversity Information Projects of the World (BIPW) dataset created by the Biodiversity Information Standards (TDWG); this project, which ran from 2007-2015 (officially removed from the TDWG website in 2018) was an attempt at organising the Web's biodiversity databases into an indexed list. To do this, we applied a simple classification scheme to score databases within BIPW based on nine data categories, to characterise trends and current compositions of this biodiversity e-infrastructure. Primarily, we found that of 600 databases investigated from BIPW, only 315 (~53%) were accessible at the time of this writing, underscoring the precarious nature of the biodiversity information landscape. Many of these databases are still available, but suffer accessibility issues such as link rot, thus putting the information they contain in danger of being lost. We propose that a community-driven database of biodiversity databases with an accompanying ontology could facilitate efficient discovery of relevant biodiversity databases and support smaller databases – which have the greatest risk of being lost.
Collapse
Affiliation(s)
- Jarrett Blair
- University of Guelph, Guelph, Canada University of Guelph Guelph Canada
| | - Rodger Gwiazdowski
- University of Massachusetts Amherst, Amherst, MA, United States of America University of Massachusetts Amherst Amherst, MA United States of America
| | - Andrew Borrelli
- University of Guelph, Guelph, Canada University of Guelph Guelph Canada
| | | | - Candace Park
- University of Guelph, Guelph, Canada University of Guelph Guelph Canada
| | - Gleannan Perrett
- University of Guelph, Guelph, Canada University of Guelph Guelph Canada
| | - Robert Hanner
- University of Guelph, Guelph, Canada University of Guelph Guelph Canada
| |
Collapse
|
21
|
Sima AC, Stockinger K, de Farias TM, Gil M. Semantic Integration and Enrichment of Heterogeneous Biological Databases. Methods Mol Biol 2020; 1910:655-690. [PMID: 31278681 DOI: 10.1007/978-1-4939-9074-0_22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Biological databases are growing at an exponential rate, currently being among the major producers of Big Data, almost on par with commercial generators, such as YouTube or Twitter. While traditionally biological databases evolved as independent silos, each purposely built by a different research group in order to answer specific research questions; more recently significant efforts have been made toward integrating these heterogeneous sources into unified data access systems or interoperable systems using the FAIR principles of data sharing. Semantic Web technologies have been key enablers in this process, opening the path for new insights into the unified data, which were not visible at the level of each independent database. In this chapter, we first provide an introduction into two of the most used database models for biological data: relational databases and RDF stores. Next, we discuss ontology-based data integration, which serves to unify and enrich heterogeneous data sources. We present an extensive timeline of milestones in data integration based on Semantic Web technologies in the field of life sciences. Finally, we discuss some of the remaining challenges in making ontology-based data access (OBDA) systems easily accessible to a larger audience. In particular, we introduce natural language search interfaces, which alleviate the need for database users to be familiar with technical query languages. We illustrate the main theoretical concepts of data integration through concrete examples, using two well-known biological databases: a gene expression database, Bgee, and an orthology database, OMA.
Collapse
Affiliation(s)
- Ana Claudia Sima
- ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland. .,University of Lausanne, Lausanne, Switzerland.
| | - Kurt Stockinger
- ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland
| | - Tarcisio Mendes de Farias
- University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Manuel Gil
- ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
22
|
Scott B, Baker E, Woodburn M, Vincent S, Hardy H, Smith VS. The Natural History Museum Data Portal. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5432299. [PMID: 30985890 PMCID: PMC6459053 DOI: 10.1093/database/baz038] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 03/01/2019] [Accepted: 03/04/2019] [Indexed: 11/13/2022]
Abstract
The Natural History Museum, London (NHM), generates and holds some of the largest global data sets relating to the biological and geological diversity of the natural world. A majority of these data were, until 2015, not widely accessible, and, even when published, were typically hard to find, poorly documented and in formats that impede discovery and integration. To better serve the bespoke needs of user communities outside and within the NHM, a dedicated data portal was developed to surface these data sets and provide a sustainable platform to encourage their citation and reuse. This paper describes the technical development of the data portal, from its inception to beta launch in December 2015, its first 2 years of operation, and future plans for the project. It outlines the development principles adopted for this prototypical project, which subsequently informed new digital project management methodologies at the NHM. The process of developing the data portal acted as a driver to implement policies necessary to encourage a culture of data sharing at the NHM.
Collapse
Affiliation(s)
- Ben Scott
- Department of Life Sciences, Natural History Museum, London, UK
| | - Ed Baker
- Department of Life Sciences, Natural History Museum, London, UK
| | - Matt Woodburn
- Department of Life Sciences, Natural History Museum, London, UK
| | - Sarah Vincent
- Department of Life Sciences, Natural History Museum, London, UK
| | - Helen Hardy
- Department of Life Sciences, Natural History Museum, London, UK
| | - Vincent S Smith
- Department of Life Sciences, Natural History Museum, London, UK
| |
Collapse
|
23
|
Harjes J, Link A, Weibulat T, Triebel D, Rambold G. FAIR digital objects in environmental and life sciences should comprise workflow operation design data and method information for repeatability of study setups and reproducibility of results. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5894776. [PMID: 32815545 PMCID: PMC7439577 DOI: 10.1093/database/baaa059] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 07/01/2020] [Accepted: 07/07/2020] [Indexed: 12/23/2022]
Abstract
Repeatability of study setups and reproducibility of research results by underlying data are major requirements in science. Until now, abstract models for describing the structural logic of studies in environmental sciences are lacking and tools for data management are insufficient. Mandatory for repeatability and reproducibility is the use of sophisticated data management solutions going beyond data file sharing. Particularly, it implies maintenance of coherent data along workflows. Design data concern elements from elementary domains of operations being transformation, measurement and transaction. Operation design elements and method information are specified for each consecutive workflow segment from field to laboratory campaigns. The strict linkage of operation design element values, operation values and objects is essential. For enabling coherence of corresponding objects along consecutive workflow segments, the assignment of unique identifiers and the specification of their relations are mandatory. The abstract model presented here addresses these aspects, and the software DiversityDescriptions (DWB-DD) facilitates the management of thusly connected digital data objects and structures. DWB-DD allows for an individual specification of operation design elements and their linking to objects. Two workflow design use cases, one for DNA barcoding and another for cultivation of fungal isolates, are given. To publish those structured data, standard schema mapping and XML-provision of digital objects are essential. Schemas useful for this mapping include the Ecological Markup Language, the Schema for Meta-omics Data of Collection Objects and the Standard for Structured Descriptive Data. Data pipelines with DWB-DD include the mapping and conversion between schemas and functions for data publishing and archiving according to the Open Archival Information System standard. The setting allows for repeatability of study setups, reproducibility of study results and for supporting work groups to structure and maintain their data from the beginning of a study. The theory of ‘FAIR++’ digital objects is introduced.
Collapse
Affiliation(s)
- Janno Harjes
- University of Bayreuth, Universitätsstraße 30, 95440 Bayreuth, Germany
| | - Anton Link
- Staatliche Naturwissenschaftliche Sammlungen Bayerns, Menzinger Straße 67, 80638 München, Germany
| | - Tanja Weibulat
- Staatliche Naturwissenschaftliche Sammlungen Bayerns, Menzinger Straße 67, 80638 München, Germany.,German Federation for Biological Data e. V., Campus Ring 1, 28759 Bremen, Germany
| | - Dagmar Triebel
- Staatliche Naturwissenschaftliche Sammlungen Bayerns, Menzinger Straße 67, 80638 München, Germany.,German Federation for Biological Data e. V., Campus Ring 1, 28759 Bremen, Germany
| | - Gerhard Rambold
- University of Bayreuth, Universitätsstraße 30, 95440 Bayreuth, Germany
| |
Collapse
|
24
|
Schneider FD, Fichtmueller D, Gossner MM, Güntsch A, Jochum M, König‐Ries B, Le Provost G, Manning P, Ostrowski A, Penone C, Simons NK. Towards an ecological trait‐data standard. Methods Ecol Evol 2019. [DOI: 10.1111/2041-210x.13288] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Florian D. Schneider
- unaffiliated, c/o Birgitta König‐Ries Department of Mathematics and Computer Science Friedrich‐Schiller‐Universität Jena Jena Germany
| | - David Fichtmueller
- Botanic Garden and Botanical Museum Berlin Freie Universität Berlin Berlin Germany
| | - Martin M. Gossner
- Forest Entomology Swiss Federal Research Institute WSL Birmensdorf Switzerland
| | - Anton Güntsch
- Botanic Garden and Botanical Museum Berlin Freie Universität Berlin Berlin Germany
| | - Malte Jochum
- Institute of Plant Sciences University of Bern Bern Switzerland
- German Centre for Integrative Biodiversity Research (iDiv) Halle‐Jena‐Leipzig Leipzig Germany
- Institute of Biology Leipzig University Leipzig Germany
| | - Birgitta König‐Ries
- Department of Mathematics and Computer Science Friedrich‐Schiller‐Universität Jena Jena Germany
| | - Gaëtane Le Provost
- Senckenberg Biodiversity and Climate Research Centre (BiK‐F) Frankfurt am Main Germany
| | - Peter Manning
- Senckenberg Biodiversity and Climate Research Centre (BiK‐F) Frankfurt am Main Germany
| | - Andreas Ostrowski
- Department of Mathematics and Computer Science Friedrich‐Schiller‐Universität Jena Jena Germany
| | - Caterina Penone
- Institute of Plant Sciences University of Bern Bern Switzerland
| | - Nadja K. Simons
- Department of Ecology and Ecosystem Management Technische Universität München Freising Germany
- Ecological Networks Department of Biology Technische Universität Darmstadt Darmstadt Germany
| |
Collapse
|
25
|
Vogt L. Organizing phenotypic data-a semantic data model for anatomy. J Biomed Semantics 2019; 10:12. [PMID: 31221226 PMCID: PMC6585074 DOI: 10.1186/s13326-019-0204-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Accepted: 06/05/2019] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Currently, almost all morphological data are published as unstructured free text descriptions. This not only brings about terminological problems regarding semantic transparency, which hampers their re-use by non-experts, but the data cannot be parsed by computers either, which in turn hampers their integration across many fields in the life sciences, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. With an ever-increasing amount of available ontologies and the development of adequate semantic technology, however, a solution to this problem becomes available. Instead of free text descriptions, morphological data can be recorded, stored, and communicated through the Web in the form of highly formalized and structured directed graphs (semantic graphs) that use ontology terms and URIs as terminology. RESULTS After introducing an instance-based approach of recording morphological descriptions as semantic graphs (i.e., Semantic Instance Anatomy Knowledge Graphs) and discussing accompanying metadata graphs, I propose a general scheme of how to efficiently organize the resulting graphs in a tuple store framework based on instances of defined named graph ontology classes. The use of such named graph resources allows meaningful fragmentation of the data, which in turn enables subsequent specification of all kinds of data views for managing and accessing morphological data. CONCLUSIONS Morphological data that comply with the here proposed semantic data model will not only be computer-parsable but also re-usable by non-experts and could be better integrated with other sources of data in the life sciences. This would allow morphology as a discipline to further participate in eScience and Big Data.
Collapse
Affiliation(s)
- Lars Vogt
- Institut für Evolutionsbiologie und Ökologie, Rheinische Friedrich-Wilhelms-Universität Bonn, An der Immenburg 1, 53121, Bonn, Germany.
| |
Collapse
|
26
|
LeFebvre MJ, Brenskelle L, Wieczorek J, Kansa SW, Kansa EC, Wallis NJ, King JN, Emery KF, Guralnick R. ZooArchNet: Connecting zooarchaeological specimens to the biodiversity and archaeology data networks. PLoS One 2019; 14:e0215369. [PMID: 30978247 PMCID: PMC6461259 DOI: 10.1371/journal.pone.0215369] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 04/01/2019] [Indexed: 11/30/2022] Open
Abstract
Interdisciplinary collaborations and data sharing are essential to addressing the long history of human-environmental interactions underlying the modern biodiversity crisis. Such collaborations are increasingly facilitated by, and dependent upon, sharing open access data from a variety of disciplinary communities and data sources, including those within biology, paleontology, and archaeology. Significant advances in biodiversity open data sharing have focused on neontological and paleontological specimen records, making available over a billion records through the Global Biodiversity Information Facility. But to date, less effort has been placed on the integration of important archaeological sources of biodiversity, such as zooarchaeological specimens. Zooarchaeological specimens are rich with both biological and cultural heritage data documenting nearly all phases of human interaction with animals and the surrounding environment through time, filling a critical gap between paleontological and neontological sources of data within biodiversity networks. Here we describe technical advances for mobilizing zooarchaeological specimen-specific biological and cultural data. In particular, we demonstrate adaptations in the workflow used by biodiversity publisher VertNet to mobilize Darwin Core formatted zooarchaeological data to the GBIF network. We also show how a linked open data approach can be used to connect existing biodiversity publishing mechanisms with archaeoinformatics publishing mechanisms through collaboration with the Open Context platform. Examples of ZooArchNet published datasets are used to show the efficacy of creating this critically needed bridge between biological and archaeological sources of open access data. These technical advances and efforts to support data publication are placed in the larger context of ZooarchNet, a new project meant to build community around new approaches to interconnect zoorchaeological data and knowledge across disciplines.
Collapse
Affiliation(s)
- Michelle J. LeFebvre
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| | - Laura Brenskelle
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| | - John Wieczorek
- Museum of Vertebrate Zoology, University of California, Berkeley, California, United States of America
| | - Sarah Whitcher Kansa
- Open Context, San Francisco, California, United States of America
- Archaeological Research Facility, University of California, Berkeley, California, United States of America
| | - Eric C. Kansa
- Open Context, San Francisco, California, United States of America
| | - Neill J. Wallis
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| | - Jessica N. King
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| | - Kitty F. Emery
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| | - Robert Guralnick
- Florida Museum of Natural History, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
27
|
Stucky BJ, Balhoff JP, Barve N, Barve V, Brenskelle L, Brush MH, Dahlem GA, Gilbert JDJ, Kawahara AY, Keller O, Lucky A, Mayhew PJ, Plotkin D, Seltmann KC, Talamas E, Vaidya G, Walls R, Yoder M, Zhang G, Guralnick R. Developing a vocabulary and ontology for modeling insect natural history data: example data, use cases, and competency questions. Biodivers Data J 2019; 7:e33303. [PMID: 30918448 PMCID: PMC6426826 DOI: 10.3897/bdj.7.e33303] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 02/28/2019] [Indexed: 11/12/2022] Open
Abstract
Insects are possibly the most taxonomically and ecologically diverse class of multicellular organisms on Earth. Consequently, they provide nearly unlimited opportunities to develop and test ecological and evolutionary hypotheses. Currently, however, large-scale studies of insect ecology, behavior, and trait evolution are impeded by the difficulty in obtaining and analyzing data derived from natural history observations of insects. These data are typically highly heterogeneous and widely scattered among many sources, which makes developing robust information systems to aggregate and disseminate them a significant challenge. As a step towards this goal, we report initial results of a new effort to develop a standardized vocabulary and ontology for insect natural history data. In particular, we describe a new database of representative insect natural history data derived from multiple sources (but focused on data from specimens in biological collections), an analysis of the abstract conceptual areas required for a comprehensive ontology of insect natural history data, and a database of use cases and competency questions to guide the development of data systems for insect natural history data. We also discuss data modeling and technology-related challenges that must be overcome to implement robust integration of insect natural history data.
Collapse
Affiliation(s)
- Brian J. Stucky
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, United States of AmericaRenaissance Computing Institute, University of North CarolinaChapel Hill, NCUnited States of America
| | - Narayani Barve
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
| | - Vijay Barve
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
| | - Laura Brenskelle
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
| | - Matthew H. Brush
- Oregon Health and Science University, Portland, OR, United States of AmericaOregon Health and Science UniversityPortland, ORUnited States of America
| | - Gregory A Dahlem
- Department of Biological Sciences, Northern Kentucky University, Highland Heights, KY, United States of AmericaDepartment of Biological Sciences, Northern Kentucky UniversityHighland Heights, KYUnited States of America
| | - James D. J. Gilbert
- Department of Biological and Marine Sciences, University of Hull, Hull, United KingdomDepartment of Biological and Marine Sciences, University of HullHullUnited Kingdom
| | - Akito Y. Kawahara
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
- Entomology and Nematology Department, University of Florida, Gainesville, FL, United States of AmericaEntomology and Nematology Department, University of FloridaGainesville, FLUnited States of America
| | - Oliver Keller
- Entomology and Nematology Department, University of Florida, Gainesville, FL, United States of AmericaEntomology and Nematology Department, University of FloridaGainesville, FLUnited States of America
| | - Andrea Lucky
- Entomology and Nematology Department, University of Florida, Gainesville, FL, United States of AmericaEntomology and Nematology Department, University of FloridaGainesville, FLUnited States of America
| | - Peter J. Mayhew
- Department of Biology, University of York, York, United KingdomDepartment of Biology, University of YorkYorkUnited Kingdom
| | - David Plotkin
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
| | | | - Elijah Talamas
- Florida Department of Agriculture and Consumer Services, Gainesville, FL, United States of AmericaFlorida Department of Agriculture and Consumer ServicesGainesville, FLUnited States of America
| | - Gaurav Vaidya
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
| | - Ramona Walls
- Bio5 and CyVerse, University of Arizona, Tucson, AZ, United States of AmericaBio5 and CyVerse, University of ArizonaTucson, AZUnited States of America
| | - Matt Yoder
- Species File Group, Illinois Natural History Survey, University of Illinois, Champaign, IL, United States of AmericaSpecies File Group, Illinois Natural History Survey, University of IllinoisChampaign, ILUnited States of America
| | - Guanyang Zhang
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
| | - Rob Guralnick
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States of AmericaFlorida Museum of Natural History, University of FloridaGainesville, FLUnited States of America
| |
Collapse
|
28
|
Brenskelle L, Stucky BJ, Deck J, Walls R, Guralnick RP. Integrating herbarium specimen observations into global phenology data systems. APPLICATIONS IN PLANT SCIENCES 2019; 7:e01231. [PMID: 30937223 PMCID: PMC6426164 DOI: 10.1002/aps3.1231] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 01/21/2019] [Indexed: 05/11/2023]
Abstract
PREMISE OF THE STUDY The Plant Phenology Ontology (PPO) was originally developed to integrate phenology observations of whole plants across different global observation networks. Here we describe a new release of the PPO and associated data pipelines that supports integration of phenology observations from herbarium specimens, which provide historical and modern phenology data. METHODS AND RESULTS Critical changes to the PPO include key terms that describe how measurements from parts of plants, which are captured in most imaged herbarium specimens, relate to whole plants. We provide proof of concept for ingesting annotations from imaged herbarium sheets of Prunus serotina, the common black cherry. We then provide an example analysis of changes in flowering timing over the past 125 years, demonstrating the value of integrating herbarium and observational phenology data sets. CONCLUSIONS These conceptual and technical advances will support the addition of phenology data from herbaria, but also could be expanded upon to facilitate the inclusion of data from photograph-based citizen science platforms. With the incorporation of herbarium phenology data, new historical baseline data will strengthen the capability to monitor, model, and forecast plant phenology changes.
Collapse
Affiliation(s)
- Laura Brenskelle
- Florida Museum of Natural HistoryUniversity of FloridaGainesvilleFloridaUSA
| | - Brian J. Stucky
- Florida Museum of Natural HistoryUniversity of FloridaGainesvilleFloridaUSA
| | - John Deck
- Berkeley Natural History MuseumsUniversity of CaliforniaBerkeleyCaliforniaUSA
| | - Ramona Walls
- CyVerseBio5 InstituteThe University of ArizonaTucsonArizonaUSA
| | - Rob P. Guralnick
- Florida Museum of Natural HistoryUniversity of FloridaGainesvilleFloridaUSA
| |
Collapse
|
29
|
Hardisty AR, Michener WK, Agosti D, Alonso García E, Bastin L, Belbin L, Bowser A, Buttigieg PL, Canhos DA, Egloff W, De Giovanni R, Figueira R, Groom Q, Guralnick RP, Hobern D, Hugo W, Koureas D, Ji L, Los W, Manuel J, Manset D, Poelen J, Saarenmaa H, Schigel D, Uhlir PF, Kissling WD. The Bari Manifesto: An interoperability framework for essential biodiversity variables. ECOL INFORM 2019. [DOI: 10.1016/j.ecoinf.2018.11.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
30
|
Dooley DM, Griffiths EJ, Gosal GS, Buttigieg PL, Hoehndorf R, Lange MC, Schriml LM, Brinkman FSL, Hsiao WWL. FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration. NPJ Sci Food 2018; 2:23. [PMID: 31304272 PMCID: PMC6550238 DOI: 10.1038/s41538-018-0032-6] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 09/25/2018] [Indexed: 11/09/2022] Open
Abstract
The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.
Collapse
Affiliation(s)
- Damion M. Dooley
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC Canada
| | - Emma J. Griffiths
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada
- Present Address: Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC Canada
| | - Gurinder S. Gosal
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC Canada
| | - Pier L. Buttigieg
- Alfred-Wegener-Institut, Helmholtz-Zentrum für Polar- und Meeresforschung, Bremen, Germany
| | - Robert Hoehndorf
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Matthew C. Lange
- Department of Food Science and Technology, UC Davis, Davis, CA USA
| | - Lynn M. Schriml
- Epidemiology & Public Health, University of Maryland School of Medicine, Baltimore, MD USA
| | - Fiona S. L. Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada
| | - William W. L. Hsiao
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC Canada
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC Canada
- British Columbia Centre for Disease Control Public Health Laboratory, Vancouver, BC Canada
| |
Collapse
|
31
|
Venkatesan A, Tagny Ngompe G, Hassouni NE, Chentli I, Guignon V, Jonquet C, Ruiz M, Larmande P. Agronomic Linked Data (AgroLD): A knowledge-based system to enable integrative biology in agronomy. PLoS One 2018; 13:e0198270. [PMID: 30500839 PMCID: PMC6269127 DOI: 10.1371/journal.pone.0198270] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 09/03/2018] [Indexed: 12/22/2022] Open
Abstract
Recent advances in high-throughput technologies have resulted in a tremendous increase in the amount of omics data produced in plant science. This increase, in conjunction with the heterogeneity and variability of the data, presents a major challenge to adopt an integrative research approach. We are facing an urgent need to effectively integrate and assimilate complementary datasets to understand the biological system as a whole. The Semantic Web offers technologies for the integration of heterogeneous data and their transformation into explicit knowledge thanks to ontologies. We have developed the Agronomic Linked Data (AgroLD- www.agrold.org), a knowledge-based system relying on Semantic Web technologies and exploiting standard domain ontologies, to integrate data about plant species of high interest for the plant science community e.g., rice, wheat, arabidopsis. We present some integration results of the project, which initially focused on genomics, proteomics and phenomics. AgroLD is now an RDF (Resource Description Format) knowledge base of 100M triples created by annotating and integrating more than 50 datasets coming from 10 data sources-such as Gramene.org and TropGeneDB-with 10 ontologies-such as the Gene Ontology and Plant Trait Ontology. Our evaluation results show users appreciate the multiple query modes which support different use cases. AgroLD's objective is to offer a domain specific knowledge platform to solve complex biological and agronomical questions related to the implication of genes/proteins in, for instances, plant disease resistance or high yield traits. We expect the resolution of these questions to facilitate the formulation of new scientific hypotheses to be validated with a knowledge-oriented approach.
Collapse
Affiliation(s)
- Aravind Venkatesan
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Gildas Tagny Ngompe
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Nordine El Hassouni
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- UMR AGAP, CIRAD, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
| | - Imene Chentli
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Valentin Guignon
- South Green Bioinformatics Platform, Montpellier, France
- Bioversity International, Montpellier, France
| | - Clement Jonquet
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Manuel Ruiz
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- UMR AGAP, CIRAD, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
- AGAP, Univ. of Montpellier, CIRAD, INRA, INRIA, SupAgro, Montpellier, France
| | - Pierre Larmande
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
- DIADE, IRD, Univ. of Montpellier, Montpellier, France
| |
Collapse
|
32
|
Thomer AK, Wickett KM, Baker KS, Fouke BW, Palmer CL. Documenting provenance in noncomputational workflows: Research process models based on geobiology fieldwork in Yellowstone National Park. J Assoc Inf Sci Technol 2018. [DOI: 10.1002/asi.24039] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Andrea K. Thomer
- School of Information, University of Michigan, 105 S. State StreetAnn Arbor Michigan 48109 USA
| | - Karen M. Wickett
- School of InformationUniversity of Texas at Austin, 1616 Guadalupe Suite #5.202Austin Texas 78701‐1213 USA
| | - Karen S. Baker
- INTERACT Research Unit, PO Box 8000, FI‐90014 University of Oulu, Finland; School of Information Sciences, University of Illinois at Urbana‐Champaign, 501 E. Daniel StreetChampaign Illinois 61820 USA
| | - Bruce W. Fouke
- Department of GeologyUniversity of Illinois Urbana‐Champaign, 1301 W. Green StreetUrbana Illinois 61801 USA
- Department of MicrobiologyUniversity of Illinois Urbana‐Champaign, 601 S. Goodwin AvenueUrbana Illinois 61801 USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana‐Champaign, 1206 W. Gregory DriveUrbana Illinois 61801 USA
| | - Carole L. Palmer
- Information School, University of Washington, Box 352840, Mary Gates Hall, Ste. 370Seattle Washington 98195‐2840 USA
| |
Collapse
|
33
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
34
|
Kissling WD, Walls R, Bowser A, Jones MO, Kattge J, Agosti D, Amengual J, Basset A, van Bodegom PM, Cornelissen JHC, Denny EG, Deudero S, Egloff W, Elmendorf SC, Alonso García E, Jones KD, Jones OR, Lavorel S, Lear D, Navarro LM, Pawar S, Pirzl R, Rüger N, Sal S, Salguero-Gómez R, Schigel D, Schulz KS, Skidmore A, Guralnick RP. Towards global data products of Essential Biodiversity Variables on species traits. Nat Ecol Evol 2018; 2:1531-1540. [PMID: 30224814 DOI: 10.1038/s41559-018-0667-3] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 07/16/2018] [Indexed: 02/03/2023]
Abstract
Essential Biodiversity Variables (EBVs) allow observation and reporting of global biodiversity change, but a detailed framework for the empirical derivation of specific EBVs has yet to be developed. Here, we re-examine and refine the previous candidate set of species traits EBVs and show how traits related to phenology, morphology, reproduction, physiology and movement can contribute to EBV operationalization. The selected EBVs express intra-specific trait variation and allow monitoring of how organisms respond to global change. We evaluate the societal relevance of species traits EBVs for policy targets and demonstrate how open, interoperable and machine-readable trait data enable the building of EBV data products. We outline collection methods, meta(data) standardization, reproducible workflows, semantic tools and licence requirements for producing species traits EBVs. An operationalization is critical for assessing progress towards biodiversity conservation and sustainable development goals and has wide implications for data-intensive science in ecology, biogeography, conservation and Earth observation.
Collapse
Affiliation(s)
- W Daniel Kissling
- Department of Theoretical and Computational Ecology, Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam, The Netherlands.
| | | | - Anne Bowser
- Woodrow Wilson International Center for Scholars, Washington DC, USA
| | - Matthew O Jones
- University of Montana, W. A. Franke Department of Forestry and Conservation, Missoula, MT, USA
| | - Jens Kattge
- Max Planck Institute for Biogeochemistry, Jena, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| | | | - Josep Amengual
- Area de Conservacion, Seguimiento y Programas de la Red, Organismo Autonomo Parques Nacionales, Ministerio de Agricultura y Pesca, Madrid, Spain
| | - Alberto Basset
- Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy
| | - Peter M van Bodegom
- Institute of Environmental Sciences, Leiden University, Leiden, The Netherlands
| | - Johannes H C Cornelissen
- Systems Ecology, Department of Ecological Science, Vrije Universiteit, Amsterdam, The Netherlands
| | - Ellen G Denny
- USA National Phenology Network, University of Arizona, Tucson, AZ, USA
| | - Salud Deudero
- Instituto Español de Oceanografía, Centro Oceanográfico de Baleares, Palma de Mallorca, Spain
| | | | - Sarah C Elmendorf
- National Ecological Observatory Network, Battelle Ecology, Boulder, CO, USA.,Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, USA
| | | | - Katherine D Jones
- National Ecological Observatory Network, Battelle Ecology, Boulder, CO, USA
| | - Owen R Jones
- Department of Biology, University of Southern Denmark, Odense M, Denmark
| | - Sandra Lavorel
- Laboratoire d'Ecologie Alpine, CNRS - Université Grenoble Alpes, Grenoble, France
| | - Dan Lear
- Marine Biological Association of the United Kingdom, Plymouth, Devon, UK
| | - Laetitia M Navarro
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Institute of Biology, Martin Luther University Halle Wittenberg, Halle (Saale), Germany
| | - Samraat Pawar
- Department of Life Sciences, Imperial College London, Ascot, Berkshire, UK
| | - Rebecca Pirzl
- CSIRO and Atlas of Living Australia, Canberra, Australian Capital Territory, Australia
| | - Nadja Rüger
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Smithsonian Tropical Research Institute, Ancon, Panama
| | - Sofia Sal
- Department of Life Sciences, Imperial College London, Ascot, Berkshire, UK
| | - Roberto Salguero-Gómez
- Department of Zoology, Oxford University, Oxford, UK.,Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK.,Centre for Biodiversity and Conservation Science, University of Queensland, St Lucia, Queensland, Australia.,Evolutionary Demography Laboratory, Max Plank Institute for Demographic Research, Rostock, Germany
| | - Dmitry Schigel
- Global Biodiversity Information Facility (GBIF), Secretariat, Copenhagen, Denmark
| | - Katja-Sabine Schulz
- Smithsonian Institution, National Museum of Natural History, Washington DC, USA
| | - Andrew Skidmore
- Department of Natural Resources, Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands.,Department of Environmental Science, Macquarie University, New South Wales, Australia
| | - Robert P Guralnick
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
| |
Collapse
|
35
|
Jonquet C, Toulet A, Dutta B, Emonet V. Harnessing the Power of Unified Metadata in an Ontology Repository: The Case of AgroPortal. JOURNAL ON DATA SEMANTICS 2018. [DOI: 10.1007/s13740-018-0091-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
36
|
Franz NM, Zhang C, Lee J. A logic approach to modelling nomenclatural change. Cladistics 2018; 34:336-357. [PMID: 34645079 DOI: 10.1111/cla.12201] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2017] [Indexed: 11/27/2022] Open
Abstract
We utilize an Answer Set Programming (ASP) approach to show that the principles of nomenclature are tractable in computational logic. To this end we design a hypothetical, 20 nomenclatural taxon use case, with starting conditions that embody several overarching principles of the International Code of Zoological Nomenclature, including Binomial Nomenclature, Priority, Coordination, Homonymy, Typification and the structural requirement of Gender Agreement. The use case ending conditions are triggered by the reinterpretation of the diagnostic features of one of 12 type specimens anchoring the corresponding species-level epithets. Permutations of this child-to-parent reassignment action lead to 36 alternative scenarios, where each scenario requires a set of 1-14 logically contingent nomenclatural emendations. We show that an ASP transition system approach can correctly infer the Code-mandated changes for each scenario, and visually output the ending conditions. The results provide a foundation for further developing logic-based nomenclatural change optimization and validation services, which could be applied in global nomenclatural registries. More generally, logic explorations of nomenclatural and taxonomic change scenarios provide a novel means of assessing design biases inherent in the principles of nomenclature, and can therefore inform the design of future, big data-compatible identifier systems that recognize and mitigate these constraints.
Collapse
Affiliation(s)
- Nico M Franz
- School of Life Sciences, Arizona State University, PO Box 874501, Tempe, AZ, 85287-4501, USA
| | - Chao Zhang
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, PO Box 878809, Tempe, AZ, 85287-8809, USA
| | - Joohyung Lee
- School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, PO Box 878809, Tempe, AZ, 85287-8809, USA
| |
Collapse
|
37
|
Stucky BJ, Guralnick R, Deck J, Denny EG, Bolmgren K, Walls R. The Plant Phenology Ontology: A New Informatics Resource for Large-Scale Integration of Plant Phenology Data. FRONTIERS IN PLANT SCIENCE 2018; 9:517. [PMID: 29765382 PMCID: PMC5938398 DOI: 10.3389/fpls.2018.00517] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2018] [Accepted: 04/04/2018] [Indexed: 05/25/2023]
Abstract
Plant phenology - the timing of plant life-cycle events, such as flowering or leafing out - plays a fundamental role in the functioning of terrestrial ecosystems, including human agricultural systems. Because plant phenology is often linked with climatic variables, there is widespread interest in developing a deeper understanding of global plant phenology patterns and trends. Although phenology data from around the world are currently available, truly global analyses of plant phenology have so far been difficult because the organizations producing large-scale phenology data are using non-standardized terminologies and metrics during data collection and data processing. To address this problem, we have developed the Plant Phenology Ontology (PPO). The PPO provides the standardized vocabulary and semantic framework that is needed for large-scale integration of heterogeneous plant phenology data. Here, we describe the PPO, and we also report preliminary results of using the PPO and a new data processing pipeline to build a large dataset of phenology information from North America and Europe.
Collapse
Affiliation(s)
- Brian J. Stucky
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
| | - Rob Guralnick
- Florida Museum of Natural History, University of Florida, Gainesville, FL, United States
| | - John Deck
- Berkeley Natural History Museums, University of California, Berkeley, Berkeley, CA, United States
| | - Ellen G. Denny
- USA National Phenology Network, The University of Arizona, Tucson, AZ, United States
| | - Kjell Bolmgren
- Unit for Field-based Forest Research, Swedish University of Agricultural Sciences, Lammhult, Sweden
| | - Ramona Walls
- CyVerse, The University of Arizona, Tucson, AZ, United States
| |
Collapse
|
38
|
James SA, Soltis PS, Belbin L, Chapman AD, Nelson G, Paul DL, Collins M. Herbarium data: Global biodiversity and societal botanical needs for novel research. APPLICATIONS IN PLANT SCIENCES 2018; 6:e1024. [PMID: 29732255 PMCID: PMC5851569 DOI: 10.1002/aps3.1024] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 12/30/2017] [Indexed: 05/11/2023]
Abstract
Building on centuries of research based on herbarium specimens gathered through time and around the globe, a new era of discovery, synthesis, and prediction using digitized collections data has begun. This paper provides an overview of how aggregated, open access botanical and associated biological, environmental, and ecological data sets, from genes to the ecosystem, can be used to document the impacts of global change on communities, organisms, and society; predict future impacts; and help to drive the remediation of change. Advocacy for botanical collections and their expansion is needed, including ongoing digitization and online publishing. The addition of non-traditional digitized data fields, user annotation capability, and born-digital field data collection enables the rapid access of rich, digitally available data sets for research, education, informed decision-making, and other scholarly and creative activities. Researchers are receiving enormous benefits from data aggregators including the Global Biodiversity Information Facility (GBIF), Integrated Digitized Biocollections (iDigBio), the Atlas of Living Australia (ALA), and the Biodiversity Heritage Library (BHL), but effective collaboration around data infrastructures is needed when working with large and disparate data sets. Tools for data discovery, visualization, analysis, and skills training are increasingly important for inspiring novel research that improves the intrinsic value of physical and digital botanical collections.
Collapse
Affiliation(s)
- Shelley A. James
- National Herbarium of New South WalesRoyal Botanic Gardens and Domain TrustMrs Macquaries RoadSydneyNew South Wales2000Australia
| | - Pamela S. Soltis
- Florida Museum of Natural HistoryUniversity of FloridaGainesvilleFlorida32611USA
| | - Lee Belbin
- Atlas of Living AustraliaCSIROClunies Ross StreetActonAustralia Capital Territory2601Australia
| | - Arthur D. Chapman
- Australian Biodiversity Information ServicesBallanVictoria3342Australia
| | - Gil Nelson
- iDigBioFlorida State UniversityTallahasseeFlorida32306USA
| | | | - Matthew Collins
- Advanced Computing and Information SystemsUniversity of FloridaGainesvilleFlorida32611USA
| |
Collapse
|
39
|
Yost JM, Sweeney PW, Gilbert E, Nelson G, Guralnick R, Gallinat AS, Ellwood ER, Rossington N, Willis CG, Blum SD, Walls RL, Haston EM, Denslow MW, Zohner CM, Morris AB, Stucky BJ, Carter JR, Baxter DG, Bolmgren K, Denny EG, Dean E, Pearson KD, Davis CC, Mishler BD, Soltis PS, Mazer SJ. Digitization protocol for scoring reproductive phenology from herbarium specimens of seed plants. APPLICATIONS IN PLANT SCIENCES 2018; 6:e1022. [PMID: 29732253 PMCID: PMC5851559 DOI: 10.1002/aps3.1022] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 01/02/2018] [Indexed: 05/13/2023]
Abstract
PREMISE OF THE STUDY Herbarium specimens provide a robust record of historical plant phenology (the timing of seasonal events such as flowering or fruiting). However, the difficulty of aggregating phenological data from specimens arises from a lack of standardized scoring methods and definitions for phenological states across the collections community. METHODS AND RESULTS To address this problem, we report on a consensus reached by an iDigBio working group of curators, researchers, and data standards experts regarding an efficient scoring protocol and a data-sharing protocol for reproductive traits available from herbarium specimens of seed plants. The phenological data sets generated can be shared via Darwin Core Archives using the Extended MeasurementOrFact extension. CONCLUSIONS Our hope is that curators and others interested in collecting phenological trait data from specimens will use the recommendations presented here in current and future scoring efforts. New tools for scoring specimens are reviewed.
Collapse
Affiliation(s)
- Jennifer M. Yost
- Department of Biological SciencesCalifornia Polytechnic State University1 Grand AvenueSan Luis ObispoCalifornia93407USA
| | - Patrick W. Sweeney
- Division of BotanyPeabody Museum of Natural HistoryYale UniversityP.O. Box 208118New HavenConnecticut06520USA
| | - Ed Gilbert
- Arizona State UniversitySchool of Life SciencesP.O. Box 874501TempeArizona85287‐4501USA
| | - Gil Nelson
- iDigBioCollege of Communication and InformationFlorida State UniversityTallahasseeFlorida32306USA
| | - Robert Guralnick
- Florida Museum of Natural History and Biodiversity InstituteUniversity of FloridaGainesvilleFlorida32611USA
| | - Amanda S. Gallinat
- Boston UniversityDepartment of Biology5 Cummington MallBostonMassachusets02215USA
| | | | - Natalie Rossington
- Department of Ecology, Evolution and Marine BiologyUniversity of CaliforniaSanta BarbaraCalifornia93106‐9620USA
| | - Charles G. Willis
- Department of Organismic and Evolutionary BiologyHarvard University Herbaria22 Divinity AvenueCambridgeMassachusetts02138USA
- University of MinnesotaDepartment of Biology Teaching and Learning515 Delaware Street SEMinneapolisMinnesota55455USA
| | - Stanley D. Blum
- Biodiversity Information Standards (TDWG)1342 34th AvenueSan FranciscoCalifornia94122USA
| | - Ramona L. Walls
- CyVerseUniversity of Arizona1657 East Helen StreetTucsonArizona85721USA
| | - Elspeth M. Haston
- Royal Botanic Garden Edinburgh20a Inverleith RowEdinburghEH3 5LRUnited Kingdom
| | - Michael W. Denslow
- Florida Museum of Natural History and Biodiversity InstituteUniversity of FloridaGainesvilleFlorida32611USA
- Department of BiologyAppalachian State UniversityBooneNorth Carolina28608USA
| | - Constantin M. Zohner
- Systematic Botany and MycologyDepartment of BiologyMunich University (LMU)80638MunichGermany
| | - Ashley B. Morris
- Department of BiologyMiddle Tennessee State UniversityMurfreesboroTennessee37138USA
| | - Brian J. Stucky
- Florida Museum of Natural History and Biodiversity InstituteUniversity of FloridaGainesvilleFlorida32611USA
| | | | - David G. Baxter
- University and Jepson HerbariaUniversity of California Berkeley1001 Valley Life Sciences BuildingBerkeleyCalifornia94720USA
| | - Kjell Bolmgren
- Swedish University of Agricultural SciencesUnit for Field‐based Forest Research360 30LammhultSweden
| | - Ellen G. Denny
- USA National Phenology NetworkUniversity of ArizonaTucsonArizona85721USA
| | - Ellen Dean
- UC Davis Center for Plant DiversityPlant Sciences M.S. 7, One Shields AvenueDavisCalifornia95616USA
| | - Katelin D. Pearson
- Department of Biological ScienceFlorida State UniversityTallahasseeFlorida32304USA
| | - Charles C. Davis
- Department of Organismic and Evolutionary BiologyHarvard University Herbaria22 Divinity AvenueCambridgeMassachusetts02138USA
| | - Brent D. Mishler
- University and Jepson HerbariaUniversity of California Berkeley1001 Valley Life Sciences BuildingBerkeleyCalifornia94720USA
- Department of Integrative BiologyUniversity of CaliforniaBerkeleyCalifornia94720‐2465USA
| | - Pamela S. Soltis
- Florida Museum of Natural History and Biodiversity InstituteUniversity of FloridaGainesvilleFlorida32611USA
| | - Susan J. Mazer
- Department of Ecology, Evolution and Marine BiologyUniversity of CaliforniaSanta BarbaraCalifornia93106‐9620USA
| |
Collapse
|
40
|
Senderov V, Simov K, Franz N, Stoev P, Catapano T, Agosti D, Sautter G, Morris RA, Penev L. OpenBiodiv-O: ontology of the OpenBiodiv knowledge management system. J Biomed Semantics 2018; 9:5. [PMID: 29347997 PMCID: PMC5774086 DOI: 10.1186/s13326-017-0174-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 12/28/2017] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND The biodiversity domain, and in particular biological taxonomy, is moving in the direction of semantization of its research outputs. The present work introduces OpenBiodiv-O, the ontology that serves as the basis of the OpenBiodiv Knowledge Management System. Our intent is to provide an ontology that fills the gaps between ontologies for biodiversity resources, such as DarwinCore-based ontologies, and semantic publishing ontologies, such as the SPAR Ontologies. We bridge this gap by providing an ontology focusing on biological taxonomy. RESULTS OpenBiodiv-O introduces classes, properties, and axioms in the domains of scholarly biodiversity publishing and biological taxonomy and aligns them with several important domain ontologies (FaBiO, DoCO, DwC, Darwin-SW, NOMEN, ENVO). By doing so, it bridges the ontological gap across scholarly biodiversity publishing and biological taxonomy and allows for the creation of a Linked Open Dataset (LOD) of biodiversity information (a biodiversity knowledge graph) and enables the creation of the OpenBiodiv Knowledge Management System. A key feature of the ontology is that it is an ontology of the scientific process of biological taxonomy and not of any particular state of knowledge. This feature allows it to express a multiplicity of scientific opinions. The resulting OpenBiodiv knowledge system may gain a high level of trust in the scientific community as it does not force a scientific opinion on its users (e.g. practicing taxonomists, library researchers, etc.), but rather provides the tools for experts to encode different views as science progresses. CONCLUSIONS OpenBiodiv-O provides a conceptual model of the structure of a biodiversity publication and the development of related taxonomic concepts. It also serves as the basis for the OpenBiodiv Knowledge Management System.
Collapse
Affiliation(s)
- Viktor Senderov
- Pensoft Publishers, Prof. Georgi Zlatarski 12, Sofia, 1700 Bulgaria
- Institute of Biodiversity and Ecosystems Research, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Kiril Simov
- Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Nico Franz
- Arizona State University, School of Life Sciences, Tempe Campus, Tempe, 4501 AZ USA
| | - Pavel Stoev
- Pensoft Publishers, Prof. Georgi Zlatarski 12, Sofia, 1700 Bulgaria
- National Museum of Natural History, 1 Tsar Osvoboditel Blvd., Sofia, 1000 Bulgaria
| | | | | | | | | | - Lyubomir Penev
- Pensoft Publishers, Prof. Georgi Zlatarski 12, Sofia, 1700 Bulgaria
- Institute of Biodiversity and Ecosystems Research, Bulgarian Academy of Sciences, Sofia, Bulgaria
| |
Collapse
|
41
|
|
42
|
|
43
|
Rosati I, Bergami C, Stanca E, Roselli L, Tagliolato P, Oggioni A, Fiore N, Pugnetti A, Zingone A, Boggero A, Basset A. A thesaurus for phytoplankton trait-based approaches: Development and applicability. ECOL INFORM 2017. [DOI: 10.1016/j.ecoinf.2017.10.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
44
|
Kissling WD, Ahumada JA, Bowser A, Fernandez M, Fernández N, García EA, Guralnick RP, Isaac NJB, Kelling S, Los W, McRae L, Mihoub J, Obst M, Santamaria M, Skidmore AK, Williams KJ, Agosti D, Amariles D, Arvanitidis C, Bastin L, De Leo F, Egloff W, Elith J, Hobern D, Martin D, Pereira HM, Pesole G, Peterseil J, Saarenmaa H, Schigel D, Schmeller DS, Segata N, Turak E, Uhlir PF, Wee B, Hardisty AR. Building essential biodiversity variables (
EBV
s) of species distribution and abundance at a global scale. Biol Rev Camb Philos Soc 2017; 93:600-625. [DOI: 10.1111/brv.12359] [Citation(s) in RCA: 169] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 07/04/2017] [Accepted: 07/05/2017] [Indexed: 12/20/2022]
Affiliation(s)
- W. Daniel Kissling
- Department Theoretical and Computational Ecology, Institute for Biodiversity and Ecosystem Dynamics (IBED) University of Amsterdam, P.O. Box 94248 1090 GE Amsterdam The Netherlands
| | - Jorge A. Ahumada
- TEAM Network, Moore Center for Science, Conservation International, 2011 Crystal Dr. Suite 500 Arlington VA 22202 U.S.A
| | - Anne Bowser
- Woodrow Wilson International Center for Scholars, 1300 Pennsylvania Ave NW Washington DC 20004 U.S.A
| | - Miguel Fernandez
- Biodiversity Conservation Group, German Centre for Integrative Biodiversity Research (iDiv) Halle‐Jena‐Leipzig, Deutscher Platz 5e 04103 Leipzig Germany
- Institute of Biology Martin Luther University Halle‐Wittenberg Halle Germany
- Instituto de Ecología Universidad Mayor de San Andrés (UMSA), Campus Universitario, Cota cota La Paz Bolivia
| | - Néstor Fernández
- Biodiversity Conservation Group, German Centre for Integrative Biodiversity Research (iDiv) Halle‐Jena‐Leipzig, Deutscher Platz 5e 04103 Leipzig Germany
- Estación Biológica de Doñana EBD‐CSIC, Américo Vespucio s.n 41092 Sevilla Spain
| | - Enrique Alonso García
- Councillor of State of the Kingdom of Spain and Honorary Researcher of the Franklin Institute of the University of Alcalá Madrid Spain
| | - Robert P. Guralnick
- University of Florida Museum of Natural History, University of Florida at Gainesville Gainesville FL 32611‐2710 U.S.A
| | - Nick J. B. Isaac
- Biological Records Centre, Centre for Ecology & Hydrology, Maclean Building, Benson Lane, Crowmarsh Gifford OX10 8BB Wallingford U.K
| | - Steve Kelling
- Cornell Lab of Ornithology Cornell University, 158 Sapsucker Woods Rd Ithaca NY 14850 U.S.A
| | - Wouter Los
- Department Theoretical and Computational Ecology, Institute for Biodiversity and Ecosystem Dynamics (IBED) University of Amsterdam, P.O. Box 94248 1090 GE Amsterdam The Netherlands
| | - Louise McRae
- Institute of Zoology, Zoological Society of London, Regent's Park NW1 4RY London U.K
| | - Jean‐Baptiste Mihoub
- UPMC Université Paris 06, Muséum National d'Histoire Naturelle, CNRS, CESCO, UMR 7204 Sorbonne Universités, 61 rue Buffon 75005 Paris France
- Department of Conservation Biology UFZ‐Helmholtz Centre for Environmental Research, Permoserstr. 15 04318 Leipzig Germany
| | - Matthias Obst
- Department of Marine Sciences Göteborg University, Box 463 SE‐40530 Göteborg Sweden
- Gothenburg Global Biodiversity Centre, Box 461 SE‐405 30 Göteborg Sweden
| | - Monica Santamaria
- CNR‐Institute of Biomembranes and Bioenergetics, Amendola 165/A Street 70126 Bari Italy
| | - Andrew K. Skidmore
- Department of Natural Resources, Faculty of Geo‐Information Science and Earth Observation (ITC) University of Twente, P.O. Box 217 7500AE Enschede The Netherlands
| | - Kristen J. Williams
- Land and Water, Commonwealth Scientific and Industrial Research Organisation (CSIRO), PO Box 1600 Canberra Australian Capital Territory 2601 Australia
| | | | - Daniel Amariles
- Decision and Policy Analysis (DAPA), International Center for Tropical Agriculture (CIAT) AA6713 Cali Colombia
- Instituto Alexander von Humboldt CALLE 28A # 15‐09 Bogota D.C. Colombia
| | - Christos Arvanitidis
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research, Thalassokosmos, Former US Base at Gournes 71003 Heraklion, Crete Greece
| | - Lucy Bastin
- School of Engineering and Applied Science Aston University, Aston Triangle B4 7ET Birmingham U.K
- Knowledge Management Unit Joint Research Centre of the European Commission, Via Enrico Fermi 21027 Varese Italy
| | - Francesca De Leo
- CNR‐Institute of Biomembranes and Bioenergetics, Amendola 165/A Street 70126 Bari Italy
| | | | - Jane Elith
- School of BioSciences (Building 143) University of Melbourne Melbourne VIC 3010 Australia
| | - Donald Hobern
- Global Biodiversity Information Facility Secretariat, Universitetsparken 15 2100 København Ø Denmark
| | - David Martin
- Land and Water, Commonwealth Scientific and Industrial Research Organisation (CSIRO), PO Box 1600 Canberra Australian Capital Territory 2601 Australia
| | - Henrique M. Pereira
- Biodiversity Conservation Group, German Centre for Integrative Biodiversity Research (iDiv) Halle‐Jena‐Leipzig, Deutscher Platz 5e 04103 Leipzig Germany
- Institute of Biology Martin Luther University Halle‐Wittenberg Halle Germany
| | - Graziano Pesole
- CNR‐Institute of Biomembranes and Bioenergetics, Amendola 165/A Street 70126 Bari Italy
- Department of Biosciences, Biotechnology and Biopharmaceutics University of Bari “A. Moro”, via Orabona 4 70125 Bari Italy
| | - Johannes Peterseil
- Department for Ecosystem Research & Environmental Information Management Umweltbundesamt GmbH, Spittelauer Lände 5 1090 Vienna Austria
| | - Hannu Saarenmaa
- Department of Forest Sciences, University of Eastern Finland, Joensuu Science Park, Länsikatu 15 FI‐80110 Joensuu Finland
| | - Dmitry Schigel
- Global Biodiversity Information Facility Secretariat, Universitetsparken 15 2100 København Ø Denmark
| | - Dirk S. Schmeller
- UPMC Université Paris 06, Muséum National d'Histoire Naturelle, CNRS, CESCO, UMR 7204 Sorbonne Universités, 61 rue Buffon 75005 Paris France
- ECOLAB, Université de Toulouse, CNRS, INPT, UPS Toulouse France
| | - Nicola Segata
- Centre for Integrative Biology University of Trento, Via Sommarive 9 38123 Trento Italy
| | - Eren Turak
- NSW Office of Environment and Heritage, PO Box A290 Sydney South NSW 1232 Australia
- Australian Museum, 6 College Street Sydney NSW 2000 Australia
| | - Paul F. Uhlir
- Consultant, Data Policy and Management, P.O. Box 305, Callicoon NY 12723 U.S.A
| | - Brian Wee
- Massive Connections, 2410 17th St NW, Apt 306 Washington DC 20009 U.S.A
| | - Alex R. Hardisty
- School of Computer Science & Informatics Cardiff University, Queens Buildings, 5 The Parade Cardiff CF24 3AA U.K
| |
Collapse
|
45
|
Vanderbilt K, Porter JH, Lu SS, Bertrand N, Blankman D, Guo X, He H, Henshaw D, Jeong K, Kim ES, Lin CC, O'Brien M, Osawa T, Ó Tuama É, Su W, Yang H. A prototype system for multilingual data discovery of International Long-Term Ecological Research (ILTER) Network data. ECOL INFORM 2017. [DOI: 10.1016/j.ecoinf.2016.11.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
46
|
Palmer CL, Thomer AK, Baker KS, Wickett KM, Hendrix CL, Rodman A, Sigler S, Fouke BW. Site-based data curation based on hot spring geobiology. PLoS One 2017; 12:e0172090. [PMID: 28253269 PMCID: PMC5333826 DOI: 10.1371/journal.pone.0172090] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 01/31/2017] [Indexed: 11/18/2022] Open
Abstract
Site-Based Data Curation (SBDC) is an approach to managing research data that prioritizes sharing and reuse of data collected at scientifically significant sites. The SBDC framework is based on geobiology research at natural hot spring sites in Yellowstone National Park as an exemplar case of high value field data in contemporary, cross-disciplinary earth systems science. Through stakeholder analysis and investigation of data artifacts, we determined that meaningful and valid reuse of digital hot spring data requires systematic documentation of sampling processes and particular contextual information about the site of data collection. We propose a Minimum Information Framework for recording the necessary metadata on sampling locations, with anchor measurements and description of the hot spring vent distinct from the outflow system, and multi-scale field photography to capture vital information about hot spring structures. The SBDC framework can serve as a global model for the collection and description of hot spring systems field data that can be readily adapted for application to the curation of data from other kinds scientifically significant sites.
Collapse
Affiliation(s)
- Carole L. Palmer
- The Information School, University of Washington, Mary Gates Hall, Suite. 370 Seattle, Washington United States of America
| | - Andrea K. Thomer
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois United States of America
- * E-mail:
| | - Karen S. Baker
- School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, Illinois United States of America
| | - Karen M. Wickett
- School of Information, University of Texas at Austin, 1616 Guadalupe Suite #5.202, Austin, Texas, United States of America
| | - Christie L. Hendrix
- Yellowstone Center for Resources, Yellowstone National Park, Yellowstone National Park, Wyoming United States of America
| | - Ann Rodman
- Yellowstone Center for Resources, Yellowstone National Park, Yellowstone National Park, Wyoming United States of America
| | - Stacey Sigler
- Yellowstone Center for Resources, Yellowstone National Park, Yellowstone National Park, Wyoming United States of America
| | - Bruce W. Fouke
- Department of Geology, University of Illinois Urbana-Champaign, Urbana, Illinois United States of America
- Department of Microbiology, University of Illinois Urbana-Champaign, 601 S. Goodwin Avenue, Urbana, Illinois United States of America
- Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, 1206 W. Gregory Drive, Urbana, Illinois United States of America
- Roy J. Carver Biotechnology Center, University of Illinois Urbana-Champaign, 2613 Institute for Genomic Biology, 1206 W. Gregory Drive, Urbana, Illinois United States of America
- Thermal Biology Institute, Montana State University, Leon Johnson Hall, Bozeman, Montana, United States of America
| |
Collapse
|
47
|
Zhang LY, Ren JD, Li XW. OIM-SM: A method for ontology integration based on semantic mapping. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2017. [DOI: 10.3233/jifs-161553] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Ling-Yu Zhang
- Institute of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei Province, China
- College of Economics and Management, Qiqihar University, Qiqihar, Heilongjiang Province, China
| | - Jia-Dong Ren
- Institute of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei Province, China
| | - Xian-Wei Li
- Software Development Centre, Agricultural Bank of China, Beijing, China
| |
Collapse
|
48
|
Coetzer W, Moodley D, Gerber A. Eliciting and Representing High-Level Knowledge Requirements to Discover Ecological Knowledge in Flower-Visiting Data. PLoS One 2016; 11:e0166559. [PMID: 27851814 PMCID: PMC5113002 DOI: 10.1371/journal.pone.0166559] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2016] [Accepted: 10/30/2016] [Indexed: 12/04/2022] Open
Abstract
Observations of individual organisms (data) can be combined with expert ecological knowledge of species, especially causal knowledge, to model and extract from flower–visiting data useful information about behavioral interactions between insect and plant organisms, such as nectar foraging and pollen transfer. We describe and evaluate a method to elicit and represent such expert causal knowledge of behavioral ecology, and discuss the potential for wider application of this method to the design of knowledge-based systems for knowledge discovery in biodiversity and ecosystem informatics.
Collapse
Affiliation(s)
- Willem Coetzer
- SAIAB: South African Institute for Aquatic Biodiversity, Private Bag 1015, Grahamstown 6140, South Africa
- CAIR: Centre for Artificial Intelligence Research, CSIR Meraka, PO Box 395, Pretoria, 0001, South Africa
- School of Mathematics, Statistics and Computer Science, University of KwaZulu–Natal, Private Bag X54001, Durban 4000, South Africa
- * E-mail:
| | - Deshendran Moodley
- CAIR: Centre for Artificial Intelligence Research, CSIR Meraka, PO Box 395, Pretoria, 0001, South Africa
- Department of Computer Science, University of Cape Town, Private Bag X3, Rondebosch, 7701, South Africa
| | - Aurona Gerber
- CAIR: Centre for Artificial Intelligence Research, CSIR Meraka, PO Box 395, Pretoria, 0001, South Africa
- Department of Informatics, University of Pretoria, Private Bag X20, Hatfield, 0028, South Africa
| |
Collapse
|
49
|
Hoehndorf R, Alshahrani M, Gkoutos GV, Gosline G, Groom Q, Hamann T, Kattge J, de Oliveira SM, Schmidt M, Sierra S, Smets E, Vos RA, Weiland C. The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants. J Biomed Semantics 2016; 7:65. [PMID: 27842607 PMCID: PMC5109718 DOI: 10.1186/s13326-016-0107-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 11/01/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The systematic analysis of a large number of comparable plant trait data can support investigations into phylogenetics and ecological adaptation, with broad applications in evolutionary biology, agriculture, conservation, and the functioning of ecosystems. Floras, i.e., books collecting the information on all known plant species found within a region, are a potentially rich source of such plant trait data. Floras describe plant traits with a focus on morphology and other traits relevant for species identification in addition to other characteristics of plant species, such as ecological affinities, distribution, economic value, health applications, traditional uses, and so on. However, a key limitation in systematically analyzing information in Floras is the lack of a standardized vocabulary for the described traits as well as the difficulties in extracting structured information from free text. RESULTS We have developed the Flora Phenotype Ontology (FLOPO), an ontology for describing traits of plant species found in Floras. We used the Plant Ontology (PO) and the Phenotype And Trait Ontology (PATO) to extract entity-quality relationships from digitized taxon descriptions in Floras, and used a formal ontological approach based on phenotype description patterns and automated reasoning to generate the FLOPO. The resulting ontology consists of 25,407 classes and is based on the PO and PATO. The classified ontology closely follows the structure of Plant Ontology in that the primary axis of classification is the observed plant anatomical structure, and more specific traits are then classified based on parthood and subclass relations between anatomical structures as well as subclass relations between phenotypic qualities. CONCLUSIONS The FLOPO is primarily intended as a framework based on which plant traits can be integrated computationally across all species and higher taxa of flowering plants. Importantly, it is not intended to replace established vocabularies or ontologies, but rather serve as an overarching framework based on which different application- and domain-specific ontologies, thesauri and vocabularies of phenotypes observed in flowering plants can be integrated.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
| | - Mona Alshahrani
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
| | - Georgios V. Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT United Kingdom
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 2AX United Kingdom
| | - George Gosline
- Royal Botanical Gardens, Kew, Richmond, Surrey, TW9 3AB United Kingdom
| | - Quentin Groom
- Botanic Garden Meise, Nieuwelaan 38, Meise, 1860 Belgium
| | - Thomas Hamann
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Jens Kattge
- Max Planck Institute for Biogeochemistry, Hans Knoell Str. 10, Jena, 07745 Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, Leipzig, 04103 Germany
| | | | - Marco Schmidt
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberganlage 25, Frankfurt am Main, 60325 Germany
| | - Soraya Sierra
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Erik Smets
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Rutger A. Vos
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Claus Weiland
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberganlage 25, Frankfurt am Main, 60325 Germany
| |
Collapse
|
50
|
Buttigieg PL, Pafilis E, Lewis SE, Schildhauer MP, Walls RL, Mungall CJ. The environment ontology in 2016: bridging domains with increased scope, semantic density, and interoperation. J Biomed Semantics 2016; 7:57. [PMID: 27664130 PMCID: PMC5035502 DOI: 10.1186/s13326-016-0097-6] [Citation(s) in RCA: 76] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2016] [Accepted: 09/03/2016] [Indexed: 01/04/2023] Open
Abstract
Background The Environment Ontology (ENVO; http://www.environmentontology.org/), first described in 2013, is a resource and research target for the semantically controlled description of environmental entities. The ontology's initial aim was the representation of the biomes, environmental features, and environmental materials pertinent to genomic and microbiome-related investigations. However, the need for environmental semantics is common to a multitude of fields, and ENVO's use has steadily grown since its initial description. We have thus expanded, enhanced, and generalised the ontology to support its increasingly diverse applications. Methods We have updated our development suite to promote expressivity, consistency, and speed: we now develop ENVO in the Web Ontology Language (OWL) and employ templating methods to accelerate class creation. We have also taken steps to better align ENVO with the Open Biological and Biomedical Ontologies (OBO) Foundry principles and interoperate with existing OBO ontologies. Further, we applied text-mining approaches to extract habitat information from the Encyclopedia of Life and automatically create experimental habitat classes within ENVO. Results Relative to its state in 2013, ENVO's content, scope, and implementation have been enhanced and much of its existing content revised for improved semantic representation. ENVO now offers representations of habitats, environmental processes, anthropogenic environments, and entities relevant to environmental health initiatives and the global Sustainable Development Agenda for 2030. Several branches of ENVO have been used to incubate and seed new ontologies in previously unrepresented domains such as food and agronomy. The current release version of the ontology, in OWL format, is available at http://purl.obolibrary.org/obo/envo.owl. Conclusions ENVO has been shaped into an ontology which bridges multiple domains including biomedicine, natural and anthropogenic ecology, ‘omics, and socioeconomic development. Through continued interactions with our users and partners, particularly those performing data archiving and sythesis, we anticipate that ENVO’s growth will accelerate in 2017. As always, we invite further contributions and collaboration to advance the semantic representation of the environment, ranging from geographic features and environmental materials, across habitats and ecosystems, to everyday objects in household settings.
Collapse
Affiliation(s)
- Pier Luigi Buttigieg
- Alfred Wegener Institut, Helmholtz Zentrum für Polar- und Meeresforschung, Am Handelshafen 12, 27570, Bremerhaven, Germany.
| | - Evangelos Pafilis
- Institute of Marine Biology Biotechnology and Aquaculture, Hellenic Centre for Marine Research, P.O Box 2214, Heraklion, 71003, Crete, Greece
| | - Suzanna E Lewis
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Mark P Schildhauer
- National Center for Ecological Analysis and Synthesis, Univ. of Calif. Santa Barbara, Santa Barbara, CA, 93101, USA
| | - Ramona L Walls
- CyVerse, Thomas J. Keating Bioresearch Building, 1657 East Helen St, Tucson, AZ, 85721, USA
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| |
Collapse
|