1
|
Lokatis S, Jeschke JM, Bernard-Verdier M, Buchholz S, Grossart HP, Havemann F, Hölker F, Itescu Y, Kowarik I, Kramer-Schadt S, Mietchen D, Musseau CL, Planillo A, Schittko C, Straka TM, Heger T. Hypotheses in urban ecology: building a common knowledge base. Biol Rev Camb Philos Soc 2023; 98:1530-1547. [PMID: 37072921 DOI: 10.1111/brv.12964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 03/31/2023] [Accepted: 04/04/2023] [Indexed: 04/20/2023]
Abstract
Urban ecology is a rapidly growing research field that has to keep pace with the pressing need to tackle the sustainability crisis. As an inherently multi-disciplinary field with close ties to practitioners and administrators, research synthesis and knowledge transfer between those different stakeholders is crucial. Knowledge maps can enhance knowledge transfer and provide orientation to researchers as well as practitioners. A promising option for developing such knowledge maps is to create hypothesis networks, which structure existing hypotheses and aggregate them according to topics and research aims. Combining expert knowledge with information from the literature, we here identify 62 research hypotheses used in urban ecology and link them in such a network. Our network clusters hypotheses into four distinct themes: (i) Urban species traits & evolution, (ii) Urban biotic communities, (iii) Urban habitats and (iv) Urban ecosystems. We discuss the potentials and limitations of this approach. All information is openly provided as part of an extendable Wikidata project, and we invite researchers, practitioners and others interested in urban ecology to contribute additional hypotheses, as well as comment and add to the existing ones. The hypothesis network and Wikidata project form a first step towards a knowledge base for urban ecology, which can be expanded and curated to benefit both practitioners and researchers.
Collapse
Affiliation(s)
- Sophie Lokatis
- Institute of Biology, Freie Universität Berlin, Königin-Luise-Str. 1-3, Berlin, 14195, Germany
- Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, Berlin, 12587, Germany
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr. 4, Leipzig, 04103, Germany
| | - Jonathan M Jeschke
- Institute of Biology, Freie Universität Berlin, Königin-Luise-Str. 1-3, Berlin, 14195, Germany
- Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, Berlin, 12587, Germany
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
| | - Maud Bernard-Verdier
- Institute of Biology, Freie Universität Berlin, Königin-Luise-Str. 1-3, Berlin, 14195, Germany
- Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, Berlin, 12587, Germany
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
| | - Sascha Buchholz
- Institute of Landscape Ecology, University of Münster, Heisenbergstr. 2, Münster, 48149, Germany
| | - Hans-Peter Grossart
- Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, Berlin, 12587, Germany
- Institute of Biochemistry and Biology, Potsdam University, Maulbeerallee 2, Potsdam, 14469, Germany
| | - Frank Havemann
- Institut für Bibliotheks- und Informationswissenschaft, Humboldt-Universität zu Berlin, Dorotheenstraße 26, Berlin, 10117, Germany
| | - Franz Hölker
- Institute of Biology, Freie Universität Berlin, Königin-Luise-Str. 1-3, Berlin, 14195, Germany
- Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, Berlin, 12587, Germany
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
| | - Yuval Itescu
- Institute of Biology, Freie Universität Berlin, Königin-Luise-Str. 1-3, Berlin, 14195, Germany
- Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, Berlin, 12587, Germany
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
| | - Ingo Kowarik
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
- Institute of Ecology, Technische Universität Berlin, Rothenburgstr. 12, Berlin, 12165, Germany
| | - Stephanie Kramer-Schadt
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
- Institute of Ecology, Technische Universität Berlin, Rothenburgstr. 12, Berlin, 12165, Germany
- Leibniz Institute for Zoo and Wildlife Research (IZW), Alfred-Kowalke-Str. 17, Berlin, 10315, Germany
| | - Daniel Mietchen
- Institute of Biology, Freie Universität Berlin, Königin-Luise-Str. 1-3, Berlin, 14195, Germany
- Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, Berlin, 12587, Germany
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
- Institute for Globally Distributed Open Research and Education (IGDORE), Gothenburg, Sweden
| | - Camille L Musseau
- Institute of Biology, Freie Universität Berlin, Königin-Luise-Str. 1-3, Berlin, 14195, Germany
- Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, Berlin, 12587, Germany
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
| | - Aimara Planillo
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
- Leibniz Institute for Zoo and Wildlife Research (IZW), Alfred-Kowalke-Str. 17, Berlin, 10315, Germany
| | - Conrad Schittko
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
- Institute of Ecology, Technische Universität Berlin, Rothenburgstr. 12, Berlin, 12165, Germany
| | - Tanja M Straka
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
- Institute of Ecology, Technische Universität Berlin, Rothenburgstr. 12, Berlin, 12165, Germany
| | - Tina Heger
- Institute of Biology, Freie Universität Berlin, Königin-Luise-Str. 1-3, Berlin, 14195, Germany
- Leibniz Institute of Freshwater Ecology and Inland Fisheries (IGB), Müggelseedamm 310, Berlin, 12587, Germany
- Berlin-Brandenburg Institute of Advanced Biodiversity Research, Königin-Luise-Str. 2-4, Berlin, 14195, Germany
- Technical University of Munich, Restoration Ecology, Emil-Ramann-Str. 6, Freising, 85350, Germany
| |
Collapse
|
2
|
Zhou J, Zeng W, Xu H, Zhao X. Active Temporal Knowledge Graph Alignment. INT J SEMANT WEB INF 2023. [DOI: 10.4018/ijswis.318339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Entity alignment aims to identify equivalent entity pairs from different knowledge graphs (KGs). Recently, aligning temporal knowledge graphs (TKGs) that contain time information has aroused increasingly more interest, as the time dimension is widely used in real-life applications. The matching between TKGs requires seed entity pairs, which are lacking in practice. Hence, it is of great significance to study TKG alignment under scarce supervision. In this work, the authors formally formulate the problem of TKG alignment with limited labeled data and propose to solve it under the active learning framework. As the core of active learning is to devise query strategies to select the most informative instances to label, the authors propose to make full use of time information and put forward novel time-aware strategies to meet the requirement of weakly supervised temporal entity alignment. Extensive experimental results on multiple real-world datasets show that it is important to study TKG alignment with scarce supervision, and the proposed time-aware strategy is effective.
Collapse
Affiliation(s)
- Jie Zhou
- Laboratory for Big Data and Decision, National University of Defense Technology, China
| | - Weixin Zeng
- Laboratory for Big Data and Decision, National University of Defense Technology, China
| | - Hao Xu
- Laboratory for Big Data and Decision, National University of Defense Technology, China
| | - Xiang Zhao
- Laboratory for Big Data and Decision, National University of Defense Technology, China
| |
Collapse
|
3
|
Liu R, Yin G, Liu Z, Zhang L. PTKE: Translation-Based Temporal Knowledge Graph Embedding in Polar Coordinate System. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
4
|
Wang X, Lyu S, Wang X, Wu X, Chen H. Temporal Knowledge Graph Embedding via Sparse Transfer Matrix. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
5
|
Chistova EV, Larionov DS, Latypova EA, Shelmanov AO, Smirnov IV. Open Information Extraction from Texts: Part III. Question Answering over an Automatically Constructed Knowledge Base. SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING 2022. [DOI: 10.3103/s014768822206003x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
|
6
|
Chansanam W, Jaroenruen Y, Kaewboonma N, Tuamsuk K. Culture knowledge graph construction techniques. EDUCATION FOR INFORMATION 2022. [DOI: 10.3233/efi-220028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This article describes the development process of the Thai cultural knowledge graph, which facilitates a more precise and rapid comprehension of the culture and customs of Thailand. The construction process is as follows: First, data collection technologies and techniques were used to obtain text data from the Wikipedia encyclopedia about cultural traditions in Thailand. Second, entity recognition and relationship extraction were performed on the structured text set. A natural language processing (NLP) technique was used to characterize and extract better textual resources from Wikipedia to support a deeper understanding of user-generated content by using automatic tools. Regarding entity recognition, a BiLSTM model was used to extract relationships between entities. After the entities and their relationships were obtained, triple data were generated from the semistructured data in the existing knowledge base. Then, a knowledge graph was created, knowledge bases were stored in the Neo4j Desktop, and the quality and performance of the created knowledge graph were assessed. According to the experimental findings, the precision value is 84.73%, the recall value is 82.26%, and the F1-score value is 83.47%; therefore, BiLSTM-CNN-CRF can successfully extract entities from the structured text.
Collapse
Affiliation(s)
- Wirapong Chansanam
- Department of Information Science, Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen, Thailand
| | - Yuttana Jaroenruen
- Informatics Innovative Center of Excellence, Walailak University, Thai Buri, Nakhon Si Thammarat, Thailand
| | - Nattapong Kaewboonma
- Rajamangala University of Technology Srivijaya, Thung Song, Nakhon Si Thammarat, Thailand
| | - Kulthida Tuamsuk
- Department of Information Science, Faculty of Humanities and Social Sciences, Khon Kaen University, Khon Kaen, Thailand
| |
Collapse
|
7
|
He P, Zhou G, Zhang M, Wei J, Chen J. Improving temporal knowledge graph embedding using tensor factorization. APPL INTELL 2022. [DOI: 10.1007/s10489-021-03149-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
8
|
Abstract
Biological taxonomy rests on a long tail of publications spanning nearly three centuries. Not only is this literature vital to resolving disputes about taxonomy and nomenclature, for many species it represents a key source-indeed sometimes the only source-of information about that species. Unlike other disciplines such as biomedicine, the taxonomic community lacks a centralised, curated literature database (the "bibliography of life"). This article argues that Wikidata can be that database as it has flexible and sophisticated models of bibliographic information, and an active community of people and programs ("bots") adding, editing, and curating that information.
Collapse
Affiliation(s)
- Roderic D. M. Page
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary & Life Sciences, University of Glasgow, Glasgow, United Kingdom
| |
Collapse
|
9
|
Govindapillai S, Soon LK, Haw SC. An empirical study on Resource Description Framework reification for trustworthiness in knowledge graphs. F1000Res 2021; 10:881. [PMID: 34900233 PMCID: PMC8634049 DOI: 10.12688/f1000research.72843.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/26/2021] [Indexed: 11/20/2022] Open
Abstract
Knowledge graph (KG) publishes machine-readable representation of knowledge on the Web. Structured data in the knowledge graph is published using Resource Description Framework (RDF) where knowledge is represented as a triple (subject, predicate, object). Due to the presence of erroneous, outdated or conflicting data in the knowledge graph, the quality of facts cannot be guaranteed. Trustworthiness of facts in knowledge graph can be enhanced by the addition of metadata like the source of information, location and time of the fact occurrence. Since RDF does not support metadata for providing provenance and contextualization, an alternate method, RDF reification is employed by most of the knowledge graphs. RDF reification increases the magnitude of data as several statements are required to represent a single fact. Another limitation for applications that uses provenance data like in the medical domain and in cyber security is that not all facts in these knowledge graphs are annotated with provenance data. In this paper, we have provided an overview of prominent reification approaches together with the analysis of popular, general knowledge graphs Wikidata and YAGO4 with regard to the representation of provenance and context data. Wikidata employs qualifiers to include metadata to facts, while YAGO4 collects metadata from Wikidata qualifiers. However, facts in Wikidata and YAGO4 can be fetched without using reification to cater for applications that do not require metadata. To the best of our knowledge, this is the first paper that investigates the method and the extent of metadata covered by two prominent KGs, Wikidata and YAGO4.
Collapse
Affiliation(s)
- Sini Govindapillai
- Faculty of Computing Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia
| | - Lay-Ki Soon
- School of Information Technology, Monash University Malaysia, Bandar Sunway, Selangor, 47500, Malaysia
| | - Su-Cheng Haw
- Faculty of Computing Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia
| |
Collapse
|
10
|
Waagmeester A, Willighagen EL, Su AI, Kutmon M, Gayo JEL, Fernández-Álvarez D, Groom Q, Schaap PJ, Verhagen LM, Koehorst JJ. A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses. BMC Biol 2021; 19:12. [PMID: 33482803 PMCID: PMC7820539 DOI: 10.1186/s12915-020-00940-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 12/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions. RESULTS As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates. CONCLUSIONS Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).
Collapse
Affiliation(s)
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Martina Kutmon
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | | | | | | | - Peter J Schaap
- Department of Agrotechnology and Food Sciences, Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands
| | | | - Jasper J Koehorst
- Department of Agrotechnology and Food Sciences, Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Wageningen, The Netherlands.
| |
Collapse
|
11
|
Fogli A, Maria Aiello L, Quercia D. Our dreams, our selves: automatic analysis of dream reports. ROYAL SOCIETY OPEN SCIENCE 2020; 7:192080. [PMID: 32968499 PMCID: PMC7481704 DOI: 10.1098/rsos.192080] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 08/06/2020] [Indexed: 05/21/2023]
Abstract
Sleep scientists have shown that dreaming helps people improve their waking lives, and they have done so by developing sophisticated content analysis scales. Dream analysis entails time-consuming manual annotation of text. That is why dream reports have been recently mined with algorithms, and these algorithms focused on identifying emotions. In so doing, researchers have not tackled two main technical challenges though: (i) how to mine aspects of dream reports that research has found important, such as characters and interactions; and (ii) how to do so in a principled way grounded in the literature. To tackle these challenges, we designed a tool that automatically scores dream reports by operationalizing the widely used dream analysis scale by Hall and Van de Castle. We validated the tool's effectiveness on hand-annotated dream reports (the average error is 0.24), scored 24 000 reports-far more than any previous study-and tested what sleep scientists call the 'continuity hypothesis' at this unprecedented scale: we found supporting evidence that dreams are a continuation of what happens in everyday life. Our results suggest that it is possible to quantify important aspects of dreams, making it possible to build technologies that bridge the current gap between real life and dreaming.
Collapse
Affiliation(s)
- Alessandro Fogli
- Computer Science Department, Università degli studi di Roma Tre, Rome, Italy
| | | | | |
Collapse
|
12
|
Haller A, Fernández JD, Kamdar MR, Polleres A. What Are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2020. [DOI: 10.1145/3369875] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. We argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. First, to define boundaries of single coherent knowledge graphs within Linked Data, a principled notion of what a dataset is, or, respectively, what links within and between datasets are, has been missing. Second, we argue that to enable FAIR knowledge graphs, Linked Data misses standardised findability and accessability mechanism via a single entry link. To address the first issue, we (i) propose a rigorous definition of a naming authority for a Linked Data dataset, (ii) define different link types for data in Linked datasets, (iii) provide an empirical analysis of linkage among the datasets of the Linked Open Data cloud, and (iv) analyse the dereferenceability of those links. We base our analyses and link computations on a scalable mechanism implemented on top of the HDT format, which allows us to analyse quantity and quality of different link types at scale.
Collapse
Affiliation(s)
- Armin Haller
- Australian National University, Canberra, Australia
| | | | | | - Axel Polleres
- Vienna University of Economics and Business, Welthandelsplatz, Vienna, Austria
| |
Collapse
|
13
|
Rula A, Zaveri A, Simperl E, Demidova E. Editorial. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2020. [DOI: 10.1145/3388748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
This editorial summarizes the content of the Special Issue on Quality Assessment of Knowledge Graphs of the Journal of Data and Information Quality (JDIQ). We dedicate this special issue to the memory of our colleague and friend Amrapali Zaveri.
Collapse
Affiliation(s)
- Anisa Rula
- University of Milano-Bicocca, Italy and University of Bonn, Germany, Milan, Italy
| | | | - Elena Simperl
- King’s College London, United Kingdom, Aldwych, London
| | - Elena Demidova
- L3S Research Center, Leibniz Universität Hannover, Germany
| |
Collapse
|
14
|
Harth A, Kirrane S, Ngonga Ngomo AC, Paulheim H, Rula A, Gentile AL, Haase P, Cochez M. Hyperbolic Knowledge Graph Embeddings for Knowledge Base Completion. THE SEMANTIC WEB 2020. [PMCID: PMC7250606 DOI: 10.1007/978-3-030-49461-2_12] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Learning embeddings of entities and relations existing in knowledge bases allows the discovery of hidden patterns in them. In this work, we examine the contribution of geometrical space to the task of knowledge base completion. We focus on the family of translational models, whose performance has been lagging. We extend these models to the hyperbolic space so as to better reflect the topological properties of knowledge bases. We investigate the type of regularities that our model, dubbed HyperKG, can capture and show that it is a prominent candidate for effectively representing a subset of Datalog rules. We empirically show, using a variety of link prediction datasets, that hyperbolic space allows to narrow down significantly the performance gap between translational and bilinear models and effectively represent certain types of rules.
Collapse
Affiliation(s)
- Andreas Harth
- University of Erlangen-Nuremberg, Nuremberg, Germany
| | - Sabrina Kirrane
- Vienna University of Economics and Business, Vienna, Austria
| | | | | | - Anisa Rula
- University of Milano-Bicocca, Milan, Italy
| | | | | | - Michael Cochez
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
15
|
Heibi I, Peroni S, Shotton D. Software review: COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations. Scientometrics 2019. [DOI: 10.1007/s11192-019-03217-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Abstract
In this paper, we present COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations (http://opencitations.net/index/coci). COCI is the first open citation index created by OpenCitations, in which we have applied the concept of citations as first-class data entities, and it contains more than 445 million DOI-to-DOI citation links derived from the data available in Crossref. These citations are described using the resource description framework by means of the newly extended version of the OpenCitations Data Model (OCDM). We introduce the workflow we have developed for creating these data, and also show the additional services that facilitate the access to and querying of these data via different access points: a SPARQL endpoint, a REST API, bulk downloads, Web interfaces, and direct access to the citations via HTTP content negotiation. Finally, we present statistics regarding the use of COCI citation data, and we introduce several projects that have already started to use COCI data for different purposes.
Collapse
|
16
|
Abstract
Purpose
The purpose of this paper is to review the current status of research on Wikidata and, in particular, of articles that either describe applications of Wikidata or provide empirical evidence, in order to uncover the topics of interest, the fields that are benefiting from its applications and which researchers and institutions are leading the work.
Design/methodology/approach
A systematic literature review is conducted to identify and review how Wikidata is being dealt with in academic research articles and the applications that are proposed. A rigorous and systematic process is implemented, aiming not only to summarize existing studies and research on the topic, but also to include an element of analytical criticism and a perspective on gaps and future research.
Findings
Despite Wikidata’s potential and the notable rise in research activity, the field is still in the early stages of study. Most research is published in conferences, highlighting such immaturity, and provides little empirical evidence of real use cases. Only a few disciplines currently benefit from Wikidata’s applications and do so with a significant gap between research and practice. Studies are dominated by European researchers, mirroring Wikidata’s content distribution and limiting its Worldwide applications.
Originality/value
The results collect and summarize existing Wikidata research articles published in the major international journals and conferences, delivering a meticulous summary of all the available empirical research on the topic which is representative of the state of the art at this time, complemented by a discussion of identified gaps and future work.
Collapse
|
17
|
A Wikidata-based tool for building and visualising narratives. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2019. [DOI: 10.1007/s00799-019-00266-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
18
|
Putman T, Hybiske K, Jow D, Afrasiabi C, Lelong S, Cano MA, Stupp GS, Waagmeester A, Good BM, Wu C, Su AI. ChlamBase: a curated model organism database for the Chlamydia research community. Database (Oxford) 2019; 2019:baz041. [PMID: 30985891 PMCID: PMC6463448 DOI: 10.1093/database/baz041] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 02/22/2019] [Accepted: 03/07/2019] [Indexed: 02/06/2023]
Abstract
The accelerating growth of genomic and proteomic information for Chlamydia species, coupled with unique biological aspects of these pathogens, necessitates bioinformatic tools and features that are not provided by major public databases. To meet these growing needs, we developed ChlamBase, a model organism database for Chlamydia that is built upon the WikiGenomes application framework, and Wikidata, a community-curated database. ChlamBase was designed to serve as a central access point for genomic and proteomic information for the Chlamydia research community. ChlamBase integrates information from numerous external databases, as well as important data extracted from the literature that are otherwise not available in structured formats that are easy to use. In addition, a key feature of ChlamBase is that it empowers users in the field to contribute new annotations and data as the field advances with continued discoveries. ChlamBase is freely and publicly available at chlambase.org.
Collapse
Affiliation(s)
- Tim Putman
- Ontology Development Group, Library, Oregon Health and Science University, Portland, OR, USA
| | - Kevin Hybiske
- Division of Allergy and Infectious Diseases, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Derek Jow
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Cyrus Afrasiabi
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Sebastien Lelong
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Marco Alvarado Cano
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Gregory S Stupp
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | | | - Benjamin M Good
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
19
|
Nechaev Y, Corcoglioniti F, Giuliano C. SocialLink: exploiting graph embeddings to link DBpedia entities to Twitter profiles. PROGRESS IN ARTIFICIAL INTELLIGENCE 2018. [DOI: 10.1007/s13748-018-0160-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
20
|
Malyshev S, Krötzsch M, González L, Gonsior J, Bielefeldt A. Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph. LECTURE NOTES IN COMPUTER SCIENCE 2018. [DOI: 10.1007/978-3-030-00668-6_23] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
21
|
Abstract
INTRODUCTION With the emergence of the 'big data' era, the biomedical research community has great interest in exploiting publicly available chemical information for drug discovery. PubChem is an example of public databases that provide a large amount of chemical information free of charge. AREAS COVERED This article provides an overview of how PubChem's data, tools, and services can be used for virtual screening and reviews recent publications that discuss important aspects of exploiting PubChem for drug discovery. EXPERT OPINION PubChem offers comprehensive chemical information useful for drug discovery. It also provides multiple programmatic access routes, which are essential to build automated virtual screening pipelines that exploit PubChem data. In addition, PubChemRDF allows users to download PubChem data and load them into a local computing facility, facilitating data integration between PubChem and other resources. PubChem resources have been used in many studies for developing bioactivity and toxicity prediction models, discovering polypharmacologic (multi-target) ligands, and identifying new macromolecule targets of compounds (for drug-repurposing or off-target side effect prediction). These studies demonstrate the usefulness of PubChem as a key resource for computer-aided drug discovery and related area.
Collapse
Affiliation(s)
- Sunghwan Kim
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Department of Health and Human Services, Bethesda , MD , USA
| |
Collapse
|
22
|
Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH. PubChem Substance and Compound databases. Nucleic Acids Res 2016; 44:D1202-13. [PMID: 26400175 PMCID: PMC4702940 DOI: 10.1093/nar/gkv951] [Citation(s) in RCA: 2758] [Impact Index Per Article: 344.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Revised: 09/10/2015] [Accepted: 09/11/2015] [Indexed: 11/13/2022] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Gang Fu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Lianyi Han
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Jane He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Jiyao Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD 20894, USA
| |
Collapse
|
23
|
Hernández D, Hogan A, Riveros C, Rojas C, Zerega E. Querying Wikidata: Comparing SPARQL, Relational and Graph Databases. LECTURE NOTES IN COMPUTER SCIENCE 2016. [DOI: 10.1007/978-3-319-46547-0_10] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|