1
|
Rossini M, Montanaro G, Montreuil O, Tarasov S. Towards computable taxonomic knowledge: Leveraging nanopublications for sharing new synonyms in the Madagascan genus Helictopleurus (Coleoptera, Scarabaeinae). Biodivers Data J 2024; 12:e120304. [PMID: 38912110 PMCID: PMC11193050 DOI: 10.3897/bdj.12.e120304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 05/14/2024] [Indexed: 06/25/2024] Open
Abstract
Background Numerous taxonomic studies have focused on the dung beetle genus Helictopleurus d'Orbigny, 1915, endemic to Madagascar. However, this genus stilll needs a thorough revision. Semantic technologies, such as nanopublications, hold the potential to enhance taxonomy by transforming how data are published and analysed. This paper evaluates the effectiveness of nanopublications in establishing synonyms within the genus Helictopleurus. New information In this study, we identify four new synonyms within Helictopleurus: H.rudicollis (Fairmaire, 1898) = H.hypocrita Balthasar, 1941 syn. nov.; H.vadoni Lebis, 1960 = H.perpunctatus Balthasar, 1963 syn. nov.; H.halffteri Balthasar, 1964 = H.dorbignyi Montreuil, 2005 syn. nov.; H.clouei (Harold, 1869) = H.gibbicollis (Fairmaire, 1895) syn. nov. Helictopleurus may have a significantly larger number of synonyms than currently known, indicating potentially inaccurate estimates about its recent extinction.We also publish the newly-established synonyms as nanopublications, which are machine-readable data snippets accessible online. Additionally, we explore the utility of nanopublications in taxonomy and demonstrate their practical use with an example query for data extraction.
Collapse
Affiliation(s)
- Michele Rossini
- Finnish Museum of Natural History (LUOMUS), University of Helsinki, Helsinki, FinlandFinnish Museum of Natural History (LUOMUS), University of HelsinkiHelsinkiFinland
- Department of Agronomy, Food, Natural resources, Animals and Environment (DAFNAE), University of Padova, Padova, ItalyDepartment of Agronomy, Food, Natural resources, Animals and Environment (DAFNAE), University of PadovaPadovaItaly
| | - Giulio Montanaro
- Finnish Museum of Natural History (LUOMUS), University of Helsinki, Helsinki, FinlandFinnish Museum of Natural History (LUOMUS), University of HelsinkiHelsinkiFinland
| | - Olivier Montreuil
- Muséum National d'Histoire Naturelle, Paris, FranceMuséum National d'Histoire NaturelleParisFrance
| | - Sergei Tarasov
- Finnish Museum of Natural History (LUOMUS), University of Helsinki, Helsinki, FinlandFinnish Museum of Natural History (LUOMUS), University of HelsinkiHelsinkiFinland
| |
Collapse
|
2
|
Morley J, Hamilton N, Floridi L. Selling NHS patient data. BMJ 2024; 384:q420. [PMID: 38387965 DOI: 10.1136/bmj.q420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]
Affiliation(s)
- Jessica Morley
- Digital Ethics Center, Yale University, New Haven, CT, USA
| | | | - Luciano Floridi
- Digital Ethics Center, Yale University, New Haven, CT, USA
- Department of Legal Studies, University of Bologna, Bologna, Italy
| |
Collapse
|
3
|
Schultes E, Roos M, Bonino da Silva Santos LO, Guizzardi G, Bouwman J, Hankemeier T, Baak A, Mons B. FAIR Digital Twins for Data-Intensive Research. Front Big Data 2022; 5:883341. [PMID: 35647536 PMCID: PMC9130601 DOI: 10.3389/fdata.2022.883341] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 04/12/2022] [Indexed: 11/13/2022] Open
Abstract
Although all the technical components supporting fully orchestrated Digital Twins (DT) currently exist, what remains missing is a conceptual clarification and analysis of a more generalized concept of a DT that is made FAIR, that is, universally machine actionable. This methodological overview is a first step toward this clarification. We present a review of previously developed semantic artifacts and how they may be used to compose a higher-order data model referred to here as a FAIR Digital Twin (FDT). We propose an architectural design to compose, store and reuse FDTs supporting data intensive research, with emphasis on privacy by design and their use in GDPR compliant open science.
Collapse
|
4
|
Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements. JOURNAL OF DATA AND INFORMATION SCIENCE 2022. [DOI: 10.2478/jdis-2022-0008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Abstract
Purpose
Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements.
Design/methodology/approach
Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels.
Findings
The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research.
Research limitations
The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE.
Practical implications
Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled.
Originality/value
We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.
Collapse
|
5
|
Kuhn T, Taelman R, Emonet V, Antonatos H, Soiland-Reyes S, Dumontier M. Semantic micro-contributions with decentralized nanopublication services. PeerJ Comput Sci 2021; 7:e387. [PMID: 33817033 PMCID: PMC7959648 DOI: 10.7717/peerj-cs.387] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 01/19/2021] [Indexed: 06/12/2023]
Abstract
While the publication of Linked Data has become increasingly common, the process tends to be a relatively complicated and heavy-weight one. Linked Data is typically published by centralized entities in the form of larger dataset releases, which has the downside that there is a central bottleneck in the form of the organization or individual responsible for the releases. Moreover, certain kinds of data entries, in particular those with subjective or original content, currently do not fit into any existing dataset and are therefore more difficult to publish. To address these problems, we present here an approach to use nanopublications and a decentralized network of services to allow users to directly publish small Linked Data statements through a simple and user-friendly interface, called Nanobench, powered by semantic templates that are themselves published as nanopublications. The published nanopublications are cryptographically verifiable and can be queried through a redundant and decentralized network of services, based on the grlc API generator and a new quad extension of Triple Pattern Fragments. We show here that these two kinds of services are complementary and together allow us to query nanopublications in a reliable and efficient manner. We also show that Nanobench makes it indeed very easy for users to publish Linked Data statements, even for those who have no prior experience in Linked Data publishing.
Collapse
Affiliation(s)
- Tobias Kuhn
- Department of Computer Science, VU Amsterdam, Amsterdam, Netherlands
| | | | - Vincent Emonet
- Institute of Data Science, Maastricht University, Maastricht, Netherlands
| | | | - Stian Soiland-Reyes
- Informatics Institute, University of Amsterdam, Amsterdam, Netherlands
- Department of Computer Science, The University of Manchester, Manchester, UK
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
6
|
Giachelle F, Dosso D, Silvello G. Search, access, and explore life science nanopublications on the Web. PeerJ Comput Sci 2021; 7:e335. [PMID: 33816986 PMCID: PMC7959622 DOI: 10.7717/peerj-cs.335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 11/20/2020] [Indexed: 06/12/2023]
Abstract
Nanopublications are Resource Description Framework (RDF) graphs encoding scientific facts extracted from the literature and enriched with provenance and attribution information. There are millions of nanopublications currently available on the Web, especially in the life science domain. Nanopublications are thought to facilitate the discovery, exploration, and re-use of scientific facts. Nevertheless, they are still not widely used by scientists outside specific circles; they are hard to find and rarely cited. We believe this is due to the lack of services to seek, find and understand nanopublications' content. To this end, we present the NanoWeb application to seamlessly search, access, explore, and re-use the nanopublications publicly available on the Web. For the time being, NanoWeb focuses on the life science domain where the vastest amount of nanopublications are available. It is a unified access point to the world of nanopublications enabling search over graph data, direct connections to evidence papers, and scientific curated databases, and visual and intuitive exploration of the relation network created by the encoded scientific facts.
Collapse
Affiliation(s)
- Fabio Giachelle
- Department of Information Engineering, University of Padua, Padova, Italy
| | - Dennis Dosso
- Department of Information Engineering, University of Padua, Padova, Italy
| | - Gianmaria Silvello
- Department of Information Engineering, University of Padua, Padova, Italy
| |
Collapse
|
7
|
Li X, Rousseau JF, Ding Y, Song M, Lu W. Understanding Drug Repurposing From the Perspective of Biomedical Entities and Their Evolution: Bibliographic Research Using Aspirin. JMIR Med Inform 2020; 8:e16739. [PMID: 32543442 PMCID: PMC7327595 DOI: 10.2196/16739] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 01/08/2020] [Accepted: 03/31/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Drug development is still a costly and time-consuming process with a low rate of success. Drug repurposing (DR) has attracted significant attention because of its significant advantages over traditional approaches in terms of development time, cost, and safety. Entitymetrics, defined as bibliometric indicators based on biomedical entities (eg, diseases, drugs, and genes) studied in the biomedical literature, make it possible for researchers to measure knowledge evolution and the transfer of drug research. OBJECTIVE The purpose of this study was to understand DR from the perspective of biomedical entities (diseases, drugs, and genes) and their evolution. METHODS In the work reported in this paper, we extended the bibliometric indicators of biomedical entities mentioned in PubMed to detect potential patterns of biomedical entities in various phases of drug research and investigate the factors driving DR. We used aspirin (acetylsalicylic acid) as the subject of the study since it can be repurposed for many applications. We propose 4 easy, transparent measures based on entitymetrics to investigate DR for aspirin: Popularity Index (P1), Promising Index (P2), Prestige Index (P3), and Collaboration Index (CI). RESULTS We found that the maxima of P1, P3, and CI are closely associated with the different repurposing phases of aspirin. These metrics enabled us to observe the way in which biomedical entities interacted with the drug during the various phases of DR and to analyze the potential driving factors for DR at the entity level. P1 and CI were indicative of the dynamic trends of a specific biomedical entity over a long time period, while P2 was more sensitive to immediate changes. P3 reflected the early signs of the practical value of biomedical entities and could be valuable for tracking the research frontiers of a drug. CONCLUSIONS In-depth studies of side effects and mechanisms, fierce market competition, and advanced life science technologies are driving factors for DR. This study showcases the way in which researchers can examine the evolution of DR using entitymetrics, an approach that can be valuable for enhancing decision making in the field of drug discovery and development.
Collapse
Affiliation(s)
- Xin Li
- Information Retrieval and Knowledge Mining Laboratory, School of Information Management, Wuhan University, Wuhan, China.,School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, United States
| | - Justin F Rousseau
- Department of Population Health and Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, United States
| | - Ying Ding
- School of Information, Dell Medical School, The University of Texas Austin, Austin, TX, United States
| | - Min Song
- Department of Library and Information Science, Yonsei University, Seoul, Republic of Korea
| | - Wei Lu
- Information Retrieval and Knowledge Mining Laboratory, School of Information Management, Wuhan University, Wuhan, China
| |
Collapse
|
8
|
Sustkova HP, Hettne KM, Wittenburg P, Jacobsen A, Kuhn T, Pergl R, Slifka J, McQuilton P, Magagna B, Sansone SA, Stocker M, Imming M, Lannom L, Musen M, Schultes E. FAIR Convergence Matrix: Optimizing the Reuse of Existing FAIR-Related Resources. DATA INTELLIGENCE 2020. [DOI: 10.1162/dint_a_00038] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
The FAIR principles articulate the behaviors expected from digital artifacts that are Findable, Accessible, Interoperable and Reusable by machines and by people. Although by now widely accepted, the FAIR Principles by design do not explicitly consider actual implementation choices enabling FAIR behaviors. As different communities have their own, often well-established implementation preferences and priorities for data reuse, coordinating a broadly accepted, widely used FAIR implementation approach remains a global challenge. In an effort to accelerate broad community convergence on FAIR implementation options, the GO FAIR community has launched the development of the FAIR Convergence Matrix. The Matrix is a platform that compiles for any community of practice, an inventory of their self-declared FAIR implementation choices and challenges. The Convergence Matrix is itself a FAIR resource, openly available, and encourages voluntary participation by any self-identified community of practice (not only the GO FAIR Implementation Networks). Based on patterns of use and reuse of existing resources, the Convergence Matrix supports the transparent derivation of strategies that optimally coordinate convergence on standards and technologies in the emerging Internet of FAIR Data and Services.
Collapse
Affiliation(s)
- Hana Pergl Sustkova
- GO FAIR International Support and Coordination Office, Leiden, The Netherlands
| | - Kristina Maria Hettne
- Centre for Digital Scholarship, Leiden University Libraries, Leiden, The Netherlands
| | - Peter Wittenburg
- Max Planck Computing and Data Facility, Gießenbachstraße 2, 85748 Garching, Germany
| | - Annika Jacobsen
- Leiden University Medical Center, Leiden, 2333 ZA, The Netherlands
| | - Tobias Kuhn
- Department of Computer Science, Vrije Universiteit Amsterdam, De Boelelaan 11051081 HV Amsterdam, The Netherlands
| | - Robert Pergl
- Czech Technical University in Prague, Faculty of Information Technology (FIT CTU), 160 00 Prague 6, Czech Republic
| | - Jan Slifka
- Czech Technical University in Prague, Faculty of Information Technology (FIT CTU), 160 00 Prague 6, Czech Republic
| | - Peter McQuilton
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
| | | | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Sciences, University of Oxford, Oxford OX13PJ, UK
| | - Markus Stocker
- TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
| | | | - Larry Lannom
- Corporation for National Research Initiatives (CNRI), Reston, Virginia 20191, USA
| | - Mark Musen
- Stanford Center for Biomedical Informatics Research, Stanford, CA 94305, USA
| | - Erik Schultes
- GO FAIR International Support and Coordination Office, Leiden, The Netherlands
| |
Collapse
|
9
|
Slenter DN, Kutmon M, Hanspers K, Riutta A, Windsor J, Nunes N, Mélius J, Cirillo E, Coort SL, Digles D, Ehrhart F, Giesbertz P, Kalafati M, Martens M, Miller R, Nishida K, Rieswijk L, Waagmeester A, Eijssen LMT, Evelo CT, Pico AR, Willighagen EL. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res 2019; 46:D661-D667. [PMID: 29136241 PMCID: PMC5753270 DOI: 10.1093/nar/gkx1064] [Citation(s) in RCA: 590] [Impact Index Per Article: 118.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 10/25/2017] [Indexed: 02/06/2023] Open
Abstract
WikiPathways (wikipathways.org) captures the collective knowledge represented in biological pathways. By providing a database in a curated, machine readable way, omics data analysis and visualization is enabled. WikiPathways and other pathway databases are used to analyze experimental data by research groups in many fields. Due to the open and collaborative nature of the WikiPathways platform, our content keeps growing and is getting more accurate, making WikiPathways a reliable and rich pathway database. Previously, however, the focus was primarily on genes and proteins, leaving many metabolites with only limited annotation. Recent curation efforts focused on improving the annotation of metabolism and metabolic pathways by associating unmapped metabolites with database identifiers and providing more detailed interaction knowledge. Here, we report the outcomes of the continued growth and curation efforts, such as a doubling of the number of annotated metabolite nodes in WikiPathways. Furthermore, we introduce an OpenAPI documentation of our web services and the FAIR (Findable, Accessible, Interoperable and Reusable) annotation of resources to increase the interoperability of the knowledge encoded in these pathways and experimental omics data. New search options, monthly downloads, more links to metabolite databases, and new portals make pathway knowledge more effortlessly accessible to individual researchers and research communities.
Collapse
Affiliation(s)
- Denise N Slenter
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Martina Kutmon
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 ER Maastricht, The Netherlands
| | | | - Anders Riutta
- Gladstone Institutes, San Francisco, California, CA 94158, USA
| | - Jacob Windsor
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Nuno Nunes
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Jonathan Mélius
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Elisa Cirillo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Susan L Coort
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Daniela Digles
- University of Vienna, Department of Pharmaceutical Chemistry, 1090 Vienna, Austria
| | - Friederike Ehrhart
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Pieter Giesbertz
- Chair of Nutritional Physiology, Technische Universität München, 85350 Freising, Germany
| | - Marianthi Kalafati
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Marvin Martens
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Ryan Miller
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Kozo Nishida
- Laboratory for Biochemical Simulation, RIKEN Quantitative Biology Center, Suita, Osaka 565-0874, Japan
| | - Linda Rieswijk
- Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, CA 94720, USA
| | - Andra Waagmeester
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Micelio, Antwerp, Belgium
| | - Lars M T Eijssen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,School for Mental Health and Neuroscience, Department of Psychiatry and Neuropsychology, Maastricht University Medical Centre, 6229 ER Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 ER Maastricht, The Netherlands
| | | | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
| |
Collapse
|
10
|
Abstract
The amount of omics data in the public domain is increasing every year. Modern science has become a data-intensive discipline. Innovative solutions for data management, data sharing, and for discovering novel datasets are therefore increasingly required. In 2016, we released the first version of the Omics Discovery Index (OmicsDI) as a light-weight system to aggregate datasets across multiple public omics data resources. OmicsDI aggregates genomics, transcriptomics, proteomics, metabolomics and multiomics datasets, as well as computational models of biological processes. Here, we propose a set of novel metrics to quantify the attention and impact of biomedical datasets. A complete framework (now integrated into OmicsDI) has been implemented in order to provide and evaluate those metrics. Finally, we propose a set of recommendations for authors, journals and data resources to promote an optimal quantification of the impact of datasets. Increasing amount of public omics data are important and valuable resources for the research community. Here, the authors develop a set of metrics to quantify the attention and impact of biomedical datasets and integrate them into the framework of Omics Discovery Index (OmicsDI).
Collapse
|
11
|
Townend GS, Ehrhart F, van Kranen HJ, Wilkinson M, Jacobsen A, Roos M, Willighagen EL, van Enckevort D, Evelo CT, Curfs LMG. MECP2 variation in Rett syndrome-An overview of current coverage of genetic and phenotype data within existing databases. Hum Mutat 2018; 39:914-924. [PMID: 29704307 PMCID: PMC6033003 DOI: 10.1002/humu.23542] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Revised: 04/18/2018] [Accepted: 04/23/2018] [Indexed: 12/30/2022]
Abstract
Rett syndrome (RTT) is a monogenic rare disorder that causes severe neurological problems. In most cases, it results from a loss-of-function mutation in the gene encoding methyl-CPG-binding protein 2 (MECP2). Currently, about 900 unique MECP2 variations (benign and pathogenic) have been identified and it is suspected that the different mutations contribute to different levels of disease severity. For researchers and clinicians, it is important that genotype-phenotype information is available to identify disease-causing mutations for diagnosis, to aid in clinical management of the disorder, and to provide counseling for parents. In this study, 13 genotype-phenotype databases were surveyed for their general functionality and availability of RTT-specific MECP2 variation data. For each database, we investigated findability and interoperability alongside practical user functionality, and type and amount of genetic and phenotype data. The main conclusions are that, as well as being challenging to find these databases and specific MECP2 variants held within, interoperability is as yet poorly developed and requires effort to search across databases. Nevertheless, we found several thousand online database entries for MECP2 variations and their associated phenotypes, diagnosis, or predicted variant effects, which is a good starting point for researchers and clinicians who want to provide, annotate, and use the data.
Collapse
Affiliation(s)
- Gillian S Townend
- Rett Expertise Centre Netherlands - GKC, Maastricht University Medical Center, Maastricht, The Netherlands
| | - Friederike Ehrhart
- Rett Expertise Centre Netherlands - GKC, Maastricht University Medical Center, Maastricht, The Netherlands.,Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Henk J van Kranen
- Rett Expertise Centre Netherlands - GKC, Maastricht University Medical Center, Maastricht, The Netherlands.,Institute for Public Health Genomics, Maastricht University, Maastricht, The Netherlands
| | - Mark Wilkinson
- Center for Plant Biotechnology and Genomics, Universidad Politécnica de Madrid, Madrid, Spain
| | - Annika Jacobsen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - David van Enckevort
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Chris T Evelo
- Rett Expertise Centre Netherlands - GKC, Maastricht University Medical Center, Maastricht, The Netherlands.,Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Leopold M G Curfs
- Rett Expertise Centre Netherlands - GKC, Maastricht University Medical Center, Maastricht, The Netherlands
| |
Collapse
|
12
|
Advancing food, nutrition, and health research in Europe by connecting and building research infrastructures in a DISH-RI: Results of the EuroDISH project. Trends Food Sci Technol 2018. [DOI: 10.1016/j.tifs.2017.12.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
13
|
Guitton Y, Tremblay-Franco M, Le Corguillé G, Martin JF, Pétéra M, Roger-Mele P, Delabrière A, Goulitquer S, Monsoor M, Duperier C, Canlet C, Servien R, Tardivel P, Caron C, Giacomoni F, Thévenot EA. Create, run, share, publish, and reference your LC–MS, FIA–MS, GC–MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics. Int J Biochem Cell Biol 2017; 93:89-101. [DOI: 10.1016/j.biocel.2017.07.002] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2017] [Revised: 06/14/2017] [Accepted: 07/10/2017] [Indexed: 12/11/2022]
|
14
|
Linked Registries: Connecting Rare Diseases Patient Registries through a Semantic Web Layer. BIOMED RESEARCH INTERNATIONAL 2017; 2017:8327980. [PMID: 29214177 PMCID: PMC5682045 DOI: 10.1155/2017/8327980] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 06/11/2017] [Accepted: 10/02/2017] [Indexed: 12/28/2022]
Abstract
Patient registries are an essential tool to increase current knowledge regarding rare diseases. Understanding these data is a vital step to improve patient treatments and to create the most adequate tools for personalized medicine. However, the growing number of disease-specific patient registries brings also new technical challenges. Usually, these systems are developed as closed data silos, with independent formats and models, lacking comprehensive mechanisms to enable data sharing. To tackle these challenges, we developed a Semantic Web based solution that allows connecting distributed and heterogeneous registries, enabling the federation of knowledge between multiple independent environments. This semantic layer creates a holistic view over a set of anonymised registries, supporting semantic data representation, integrated access, and querying. The implemented system gave us the opportunity to answer challenging questions across disperse rare disease patient registries. The interconnection between those registries using Semantic Web technologies benefits our final solution in a way that we can query single or multiple instances according to our needs. The outcome is a unique semantic layer, connecting miscellaneous registries and delivering a lightweight holistic perspective over the wealth of knowledge stemming from linked rare disease patient registries.
Collapse
|
15
|
McKiernan EC, Marrone DF. CA1 pyramidal cells have diverse biophysical properties, affected by development, experience, and aging. PeerJ 2017; 5:e3836. [PMID: 28948109 PMCID: PMC5609525 DOI: 10.7717/peerj.3836] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 08/31/2017] [Indexed: 12/04/2022] Open
Abstract
Neuron types (e.g., pyramidal cells) within one area of the brain are often considered homogeneous, despite variability in their biophysical properties. Here we review literature demonstrating variability in the electrical activity of CA1 hippocampal pyramidal cells (PCs), including responses to somatic current injection, synaptic stimulation, and spontaneous network-related activity. In addition, we describe how responses of CA1 PCs vary with development, experience, and aging, and some of the underlying ionic currents responsible. Finally, we suggest directions that may be the most impactful in expanding this knowledge, including the use of text and data mining to systematically study cellular heterogeneity in more depth; dynamical systems theory to understand and potentially classify neuron firing patterns; and mathematical modeling to study the interaction between cellular properties and network output. Our goals are to provide a synthesis of the literature for experimentalists studying CA1 PCs, to give theorists an idea of the rich diversity of behaviors models may need to reproduce to accurately represent these cells, and to provide suggestions for future research.
Collapse
Affiliation(s)
- Erin C McKiernan
- Departamento de Física, Facultad de Ciencias, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Diano F Marrone
- Department of Psychology, Wilfrid Laurier University, Waterloo, Ontario, Canada.,McKnight Brain Institute, University of Arizona, Tucson, AZ, United States of America
| |
Collapse
|
16
|
López-Massaguer O, Sanz F, Pastor M. An automated tool for obtaining QSAR-ready series of compounds using semantic web technologies. Bioinformatics 2017; 34:131-133. [DOI: 10.1093/bioinformatics/btx566] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 09/06/2017] [Indexed: 11/13/2022] Open
Affiliation(s)
- Oriol López-Massaguer
- Research Programme on Biomedical Informatics (GRIB), Institut Hospital del Mar d’Investigacions Mèdiques (IMIM), Dept. of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Institut Hospital del Mar d’Investigacions Mèdiques (IMIM), Dept. of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Institut Hospital del Mar d’Investigacions Mèdiques (IMIM), Dept. of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| |
Collapse
|
17
|
Ding Y, Stirling K. Data-driven Discovery: A New Era of Exploiting the Literature and Data. JOURNAL OF DATA AND INFORMATION SCIENCE 2017. [DOI: 10.20309/jdis.201622] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Abstract
In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.
The Panama Canal, the 77-kilometer waterway connecting the Atlantic and Pacific oceans, has played a crucial role in international trade for more than a century. However, digging the Panama Canal was an exceedingly challenging process. A French effort in the late 19th century was abandoned because of equipment issues and a significant loss of labor due to tropical diseases transmitted by mosquitoes. The United States officially took control of the project in 1902. The United States replaced the unusable French equipment with new construction equipment that was designed for a much larger and faster scale of work. Colonel William C. Gorgas was appointed as the chief sanitation officer and charged with eliminating mosquito-spread illnesses. After overcoming these and additional trials and tribulations, the Canal successfully opened on August 15, 1914. The triumphant completion of the Panama Canal demonstrates that using the right tools and eliminating significant threats are critical steps in any project.
More than 100 years later, a paradigm shift is occurring, as we move into a data-centered era. Today, data are extremely rich but overwhelming, and extracting information out of data requires not only the right tools and methods but also awareness of major threats. In this data-intensive era, the traditional method of exploring the related publications and available datasets from previous experiments to arrive at a testable hypothesis is becoming obsolete. Consider the fact that a new article is published every 30 seconds (Jinha, 2010). In fact, for the common disease of diabetes, there have been roughly 500,000 articles published to date; even if a scientist reads 20 papers per day, he will need 68 years to wade through all the material. The standard method simply cannot sufficiently deal with the large volume of documents or the exponential growth of datasets. A major threat is that the canon of domain knowledge cannot be consumed and held in human memory. Without efficient methods to process information and without a way to eliminate the fundamental threat of limited memory and time to handle the data deluge, we may find ourselves facing failure as the French did on the Isthmus of Panama more than a century ago.
Scouring the literature and data to generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets (Evans & Foster, 2011). In reality, most scholars have never been able to keep completely up-to-date with publications and datasets considering the unending increase in quantity and diversity of research within their own areas of focus, let alone in related conceptual areas in which knowledge may be segregated by syntactically impenetrable keyword barriers or an entirely different research corpus.
Research communities in many disciplines are finally recognizing that with advances in information technology there needs to be new ways to extract entities from increasingly data-intensive publications and to integrate and analyze large-scale datasets. This provides a compelling opportunity to improve the process of knowledge discovery from the literature and datasets through use of knowledge graphs and an associated framework that integrates scholars, domain knowledge, datasets, workflows, and machines on a scale previously beyond our reach (Ding et al., 2013).
Collapse
Affiliation(s)
- Ying Ding
- Department of Information and Library Science , Indiana University , Bloomington , IN 47405 , USA
| | - Kyle Stirling
- Department of Information and Library Science , Indiana University , Bloomington , IN 47405 , USA
| |
Collapse
|
18
|
Hassani-Pak K, Rawlings C. Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes. J Integr Bioinform 2017; 14:/j/jib.ahead-of-print/jib-2016-0002/jib-2016-0002.xml. [PMID: 28609292 PMCID: PMC6042805 DOI: 10.1515/jib-2016-0002] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 02/16/2017] [Indexed: 02/06/2023] Open
Abstract
Genetics and “omics” studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.
Collapse
|
19
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
20
|
Penev L, Georgiev T, Geshev P, Demirov S, Senderov V, Kuzmova I, Kostadinova I, Peneva S, Stoev P. ARPHA-BioDiv: A toolbox for scholarly publication and dissemination of biodiversity data based on the ARPHA Publishing Platform. RESEARCH IDEAS AND OUTCOMES 2017. [DOI: 10.3897/rio.3.e13088] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
21
|
Goldmann D, Zdrazil B, Digles D, Ecker GF. Empowering pharmacoinformatics by linked life science data. J Comput Aided Mol Des 2017; 31:319-328. [PMID: 27830428 PMCID: PMC5385323 DOI: 10.1007/s10822-016-9990-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 10/24/2016] [Indexed: 11/11/2022]
Abstract
With the public availability of large data sources such as ChEMBLdb and the Open PHACTS Discovery Platform, retrieval of data sets for certain protein targets of interest with consistent assay conditions is no longer a time consuming process. Especially the use of workflow engines such as KNIME or Pipeline Pilot allows complex queries and enables to simultaneously search for several targets. Data can then directly be used as input to various ligand- and structure-based studies. In this contribution, using in-house projects on P-gp inhibition, transporter selectivity, and TRPV1 modulation we outline how the incorporation of linked life science data in the daily execution of projects allowed to expand our approaches from conventional Hansch analysis to complex, integrated multilayer models.
Collapse
Affiliation(s)
- Daria Goldmann
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, 1090, Vienna, Austria
| | - Barbara Zdrazil
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, 1090, Vienna, Austria
| | - Daniela Digles
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, 1090, Vienna, Austria
| | - Gerhard F Ecker
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, 1090, Vienna, Austria.
| |
Collapse
|
22
|
Tripathi S, Vercruysse S, Chawla K, Christie KR, Blake JA, Huntley RP, Orchard S, Hermjakob H, Thommesen L, Lægreid A, Kuiper M. Gene regulation knowledge commons: community action takes care of DNA binding transcription factors. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw088. [PMID: 27270715 PMCID: PMC4911790 DOI: 10.1093/database/baw088] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 05/05/2016] [Indexed: 12/23/2022]
Abstract
A large gap remains between the amount of knowledge in scientific literature and the fraction that gets curated into standardized databases, despite many curation initiatives. Yet the availability of comprehensive knowledge in databases is crucial for exploiting existing background knowledge, both for designing follow-up experiments and for interpreting new experimental data. Structured resources also underpin the computational integration and modeling of regulatory pathways, which further aids our understanding of regulatory dynamics. We argue how cooperation between the scientific community and professional curators can increase the capacity of capturing precise knowledge from literature. We demonstrate this with a project in which we mobilize biological domain experts who curate large amounts of DNA binding transcription factors, and show that they, although new to the field of curation, can make valuable contributions by harvesting reported knowledge from scientific papers. Such community curation can enhance the scientific epistemic process. Database URL: http://www.tfcheckpoint.org
Collapse
Affiliation(s)
- Sushil Tripathi
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | - Steven Vercruysse
- Department of Biology, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | - Konika Chawla
- Department of Biology, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | - Karen R Christie
- Department of Computational Biology and Bioinformatics, The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, USA
| | - Judith A Blake
- Department of Computational Biology and Bioinformatics, The Jackson Laboratory, 600 Main Street, Bar Harbor, ME, USA
| | - Rachael P Huntley
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Science University College, London WC1E 6JF, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Liv Thommesen
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway Department of Medical Laboratory Technology, Norwegian University of Science and Technology (NTNU) 7491 Trondheim, Norway
| | - Astrid Lægreid
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology (NTNU), 7491 Trondheim, Norway
| |
Collapse
|
23
|
de Leeuw N, Dijkhuizen T, Hehir-Kwa JY, Carter NP, Feuk L, Firth HV, Kuhn RM, Ledbetter DH, Martin CL, van Ravenswaaij-Arts CMA, Scherer SW, Shams S, Van Vooren S, Sijmons R, Swertz M, Hastings R. Diagnostic interpretation of array data using public databases and internet sources. Hum Mutat 2016; 33:930-40. [PMID: 26285306 DOI: 10.1002/humu.22049] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The range of commercially available array platforms and analysis software packages is expanding and their utility is improving, making reliable detection of copy-number variants (CNVs) relatively straightforward. Reliable interpretation of CNV data, however, is often difficult and requires expertise. With our knowledge of the human genome growing rapidly, applications for array testing continuously broadening, and the resolution of CNV detection increasing, this leads to great complexity in interpreting what can be daunting data. Correct CNV interpretation and optimal use of the genotype information provided by single-nucleotide polymorphism probes on an array depends largely on knowledge present in various resources. In addition to the availability of host laboratories' own datasets and national registries, there are several public databases and Internet resources with genotype and phenotype information that can be used for array data interpretation. With so many resources now available, it is important to know which are fit-for-purpose in a diagnostic setting. We summarize the characteristics of the most commonly used Internet databases and resources, and propose a general data interpretation strategy that can be used for comparative hybridization, comparative intensity, and genotype-based array data.
Collapse
Affiliation(s)
- Nicole de Leeuw
- Department of Human Genetics, Radboud University Nijmegen Medical Centre, Nijmegen, the Netherlands
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Hettne KM, Thompson M, van Haagen HHHBM, van der Horst E, Kaliyaperumal R, Mina E, Tatum Z, Laros JFJ, van Mulligen EM, Schuemie M, Aten E, Li TS, Bruskiewich R, Good BM, Su AI, Kors JA, den Dunnen J, van Ommen GJB, Roos M, ‘t Hoen PA, Mons B, Schultes EA. The Implicitome: A Resource for Rationalizing Gene-Disease Associations. PLoS One 2016; 11:e0149621. [PMID: 26919047 PMCID: PMC4769089 DOI: 10.1371/journal.pone.0149621] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 02/03/2016] [Indexed: 11/19/2022] Open
Abstract
High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing biomedical knowledge for identification and interpretation of gene-disease associations. The implicitome can be used in conjunction with experimental data resources to rationalize both known and novel associations. We demonstrate the usefulness of the implicitome by rationalizing known and novel gene-disease associations, including those from GWAS. To facilitate the re-use of implicit gene-disease associations, we publish our data in compliance with FAIR Data Publishing recommendations [https://www.force11.org/group/fairgroup] using nanopublications. An online tool (http://knowledge.bio) is available to explore established and potential gene-disease associations in the context of other biomedical relations.
Collapse
Affiliation(s)
- Kristina M. Hettne
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- * E-mail:
| | - Mark Thompson
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | | | - Eelke van der Horst
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Rajaram Kaliyaperumal
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Eleni Mina
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Zuotian Tatum
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Jeroen F. J. Laros
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Erik M. van Mulligen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Martijn Schuemie
- Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Emmelien Aten
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Tong Shu Li
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
| | | | - Benjamin M. Good
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
| | - Andrew I. Su
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
| | - Jan A. Kors
- Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Johan den Dunnen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Gert-Jan B. van Ommen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter A.C. ‘t Hoen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Barend Mons
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- Dutch Techcentre for Life Sciences, Utrecht, The Netherlands
| | - Erik A. Schultes
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- Leiden Institute for Advanced Computer Science, Leiden, The Netherlands
| |
Collapse
|
25
|
Dyke SOM, Philippakis AA, Rambla De Argila J, Paltoo DN, Luetkemeier ES, Knoppers BM, Brookes AJ, Spalding JD, Thompson M, Roos M, Boycott KM, Brudno M, Hurles M, Rehm HL, Matern A, Fiume M, Sherry ST. Consent Codes: Upholding Standard Data Use Conditions. PLoS Genet 2016; 12:e1005772. [PMID: 26796797 PMCID: PMC4721915 DOI: 10.1371/journal.pgen.1005772] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A systematic way of recording data use conditions that are based on consent permissions as found in the datasets of the main public genome archives (NCBI dbGaP and EMBL-EBI/CRG EGA).
Collapse
Affiliation(s)
- Stephanie O. M. Dyke
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
- * E-mail:
| | | | - Jordi Rambla De Argila
- Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Dina N. Paltoo
- Office of Science Policy, Office of the Director, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Erin S. Luetkemeier
- Office of Science Policy, Office of the Director, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Bartha M. Knoppers
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
| | - Anthony J. Brookes
- Department of Genetics, University of Leicester, Leicester, United Kingdom
| | - J. Dylan Spalding
- European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL—EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mark Thompson
- Human Genetics Department, Leiden University Medical Center, Leiden, The Netherlands
| | - Marco Roos
- Human Genetics Department, Leiden University Medical Center, Leiden, The Netherlands
| | - Kym M. Boycott
- Children’s Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Michael Brudno
- Centre for Computational Medicine, Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Matthew Hurles
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Heidi L. Rehm
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Pathology, Brigham & Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Andreas Matern
- Bioreference Laboratories, Inc., Elmwood Park, New Jersey, United States of America
| | | | - Stephen T. Sherry
- National Centre for Biotechnology Information, US National Library of Medicine, Bethesda, Maryland, United States of America
| |
Collapse
|
26
|
Womack RP. Research Data in Core Journals in Biology, Chemistry, Mathematics, and Physics. PLoS One 2015; 10:e0143460. [PMID: 26636676 PMCID: PMC4670119 DOI: 10.1371/journal.pone.0143460] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2015] [Accepted: 11/04/2015] [Indexed: 11/19/2022] Open
Abstract
This study takes a stratified random sample of articles published in 2014 from the top 10 journals in the disciplines of biology, chemistry, mathematics, and physics, as ranked by impact factor. Sampled articles were examined for their reporting of original data or reuse of prior data, and were coded for whether the data was publicly shared or otherwise made available to readers. Other characteristics such as the sharing of software code used for analysis and use of data citation and DOIs for data were examined. The study finds that data sharing practices are still relatively rare in these disciplines’ top journals, but that the disciplines have markedly different practices. Biology top journals share original data at the highest rate, and physics top journals share at the lowest rate. Overall, the study finds that within the top journals, only 13% of articles with original data published in 2014 make the data available to others.
Collapse
Affiliation(s)
- Ryan P Womack
- Rutgers University Libraries, Rutgers-The State University of New Jersey, New Brunswick, New Jersey, United States of America
| |
Collapse
|
27
|
van Dam JCJ, Koehorst JJ, Schaap PJ, Martins dos Santos VAP, Suarez-Diez M. RDF2Graph a tool to recover, understand and validate the ontology of an RDF resource. J Biomed Semantics 2015; 6:39. [PMID: 26500754 PMCID: PMC4619317 DOI: 10.1186/s13326-015-0038-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 09/23/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Semantic web technologies have a tremendous potential for the integration of heterogeneous data sets. Therefore, an increasing number of widely used biological resources are becoming available in the RDF data model. There are however, no tools available that provide structural overviews of these resources. Such structural overviews are essential to efficiently query these resources and to assess their structural integrity and design, thereby strengthening their use and potential. RESULTS Here we present RDF2Graph, a tool that automatically recovers the structure of an RDF resource. The generated overview allows to create complex queries on these resources and to structurally validate newly created resources. CONCLUSION RDF2Graph facilitates the creation of complex queries thereby enabling access to knowledge stored across multiple RDF resources. RDF2Graph facilitates creation of high quality resources and resource descriptions, which in turn increases usability of the semantic web technologies.
Collapse
Affiliation(s)
- Jesse CJ van Dam
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, Wageningen, 6703 HB The Netherlands
| | - Jasper J Koehorst
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, Wageningen, 6703 HB The Netherlands
| | - Peter J Schaap
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, Wageningen, 6703 HB The Netherlands
| | - Vitor AP Martins dos Santos
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, Wageningen, 6703 HB The Netherlands
- />LifeGlimmer, GmbH, Markelstrasse 38, Berlin, Germany
| | - Maria Suarez-Diez
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, Wageningen, 6703 HB The Netherlands
| |
Collapse
|
28
|
Lopes P, Oliveira JL. An automated real-time integration and interoperability framework for bioinformatics. BMC Bioinformatics 2015; 16:328. [PMID: 26464306 PMCID: PMC4603302 DOI: 10.1186/s12859-015-0761-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 10/06/2015] [Indexed: 11/29/2022] Open
Abstract
Background In recent years data integration has become an everyday undertaking for life sciences researchers. Aggregating and processing data from disparate sources, whether through specific developed software or via manual processes, is a common task for scientists. However, the scope and usability of the majority of current integration tools fail to deal with the fast growing and highly dynamic nature of biomedical data. Results In this work we introduce a reactive and event-driven framework that simplifies real-time data integration and interoperability. This platform facilitates otherwise difficult tasks, such as connecting heterogeneous services, indexing, linking and transferring data from distinct resources, or subscribing to notifications regarding the timeliness of dynamic data. For developers, the framework automates the deployment of integrative and interoperable bioinformatics applications, using atomic data storage for content change detection, and enabling agent-based intelligent extract, transform and load tasks. Conclusions This work bridges the gap between the growing number of services, accessing specific data sources or algorithms, and the growing number of users, performing simple integration tasks on a recurring basis, through a streamlined workspace available to researchers and developers alike.
Collapse
Affiliation(s)
- Pedro Lopes
- DETI/IEETA, Universidade de Aveiro, Campus Universitario de Santiago, Aveiro, 3810-193, Portugal.
| | - José Luís Oliveira
- DETI/IEETA, Universidade de Aveiro, Campus Universitario de Santiago, Aveiro, 3810-193, Portugal.
| |
Collapse
|
29
|
Read KB, Sheehan JR, Huerta MF, Knecht LS, Mork JG, Humphreys BL. Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study. PLoS One 2015. [PMID: 26207759 PMCID: PMC4514623 DOI: 10.1371/journal.pone.0132735] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Objective This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository. Methods We analyzed NIH-funded journal articles that were published in 2011, cited in PubMed and deposited in PubMed Central (PMC) to identify those that indicate data were submitted to a known repository. After excluding those articles, we analyzed a random sample of the remaining articles to estimate how many and what types of invisible datasets were used in each article. Results About 12% of the articles explicitly mention deposition of datasets in recognized repositories, leaving 88% that are invisible datasets. Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011. Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects. Conclusion In addition to providing a rough estimate of the total number of datasets produced per year by NIH-funded researchers, this study identifies additional issues that must be addressed to improve the discoverability of and access to biomedical research data: the definition of a “dataset,” determination of which (if any) data are valuable for archiving and preservation, and better methods for estimating the number of datasets of interest. Lack of consensus amongst annotators about the number of datasets in a given article reinforces the need for a principled way of thinking about how to identify and characterize biomedical datasets.
Collapse
Affiliation(s)
- Kevin B. Read
- Medical Library, NYU Langone Medical Center, New York, New York, United States of America
- * E-mail:
| | - Jerry R. Sheehan
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Michael F. Huerta
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Lou S. Knecht
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - James G. Mork
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Betsy L. Humphreys
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | | |
Collapse
|
30
|
González-Beltrán A, Li P, Zhao J, Avila-Garcia MS, Roos M, Thompson M, van der Horst E, Kaliyaperumal R, Luo R, Lee TL, Lam TW, Edmunds SC, Sansone SA, Rocca-Serra P. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics. PLoS One 2015; 10:e0127612. [PMID: 26154165 PMCID: PMC4495984 DOI: 10.1371/journal.pone.0127612] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 04/16/2015] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler. RESULTS Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata. AVAILABILITY SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/. CONTACT philippe.rocca-serra@oerc.ox.ac.uk and susanna-assunta.sansone@oerc.ox.ac.uk.
Collapse
Affiliation(s)
| | - Peter Li
- GigaScience, BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong, People’s Republic of China
| | - Jun Zhao
- InfoLab21, Lancaster University, Bailrigg, Lancaster, LA1 4WA, United Kingdom
| | - Maria Susana Avila-Garcia
- Nuffield Department of Medicine, Experimental Medicine Division, John Radcliffe Hospital, Headley Way, Headington, Oxford, OX3 9DU, United Kingdom
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Mark Thompson
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Eelke van der Horst
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Rajaram Kaliyaperumal
- Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
| | - Ruibang Luo
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong, People’s Republic of China
| | - Tin-Lap Lee
- School of Biomedical Sciences and CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong, People’s Republic of China
| | - Tak-wah Lam
- HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong, People’s Republic of China
| | - Scott C. Edmunds
- GigaScience, BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong, People’s Republic of China
| | | | - Philippe Rocca-Serra
- Oxford e-Research Centre, University of Oxford, 7 Keble Road, OX1 3QG, United Kingdom
| |
Collapse
|
31
|
Uddin S, Khan A, Baur LA. A framework to explore the knowledge structure of multidisciplinary research fields. PLoS One 2015; 10:e0123537. [PMID: 25915521 PMCID: PMC4410998 DOI: 10.1371/journal.pone.0123537] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 03/04/2015] [Indexed: 01/08/2023] Open
Abstract
Understanding emerging areas of a multidisciplinary research field is crucial for researchers, policymakers and other stakeholders. For them a knowledge structure based on longitudinal bibliographic data can be an effective instrument. But with the vast amount of available online information it is often hard to understand the knowledge structure for data. In this paper, we present a novel approach for retrieving online bibliographic data and propose a framework for exploring knowledge structure. We also present several longitudinal analyses to interpret and visualize the last 20 years of published obesity research data.
Collapse
Affiliation(s)
- Shahadat Uddin
- Complex Systems Research Group, Project Management Program, University of Sydney, Sydney, New South Wales, Australia
| | - Arif Khan
- Complex Systems Research Group, Project Management Program, University of Sydney, Sydney, New South Wales, Australia
| | - Louise A. Baur
- Discipline of Paediatrics & Child Health, and Sydney School of Public Health, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
32
|
Bravo À, Piñero J, Queralt-Rosinach N, Rautschka M, Furlong LI. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinformatics 2015; 16:55. [PMID: 25886734 PMCID: PMC4466840 DOI: 10.1186/s12859-015-0472-9] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 01/19/2015] [Indexed: 11/23/2022] Open
Abstract
Background Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases. Results By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications. Conclusions BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0472-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Àlex Bravo
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Janet Piñero
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Núria Queralt-Rosinach
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Michael Rautschka
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), IMIM, DCEXS, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
33
|
Eijssen L, Evelo C, Kok R, Mons B, Hooft R. The Dutch Techcentre for Life Sciences: Enabling data-intensive life science research in the Netherlands. F1000Res 2015; 4:33. [PMID: 26913186 PMCID: PMC4743138 DOI: 10.12688/f1000research.6009.2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2016] [Indexed: 11/20/2022] Open
Abstract
We describe the Data programme of the Dutch Techcentre for Life Sciences (DTL, www.dtls.nl). DTL is a new national organisation in scientific research that facilitates life scientists with technologies and technological expertise in an era where new projects often are data-intensive, multi-disciplinary, and multi-site. It is run as a lean not-for-profit organisation with research organisations (both academic and industrial) as paying members. The small staff of the organisation undertakes a variety of tasks that are necessary to perform or support modern academic research, but that are not easily undertaken in a purely academic setting. DTL Data takes care of such tasks related to data stewardship, facilitating exchange of knowledge and expertise, and brokering access to e-infrastructure. DTL also represents the Netherlands in ELIXIR, the European infrastructure for life science data. The organisation is still being fine-tuned and this will continue over time, as it is crucial for this kind of organisation to adapt to a constantly changing environment. However, already being underway for several years, our experiences can benefit researchers in other fields or other countries setting up similar initiatives.
Collapse
Affiliation(s)
- Lars Eijssen
- Department of Bioinformatics - BiGCaT, Maastricht University, 6229 ER Maastricht, Netherlands
| | - Chris Evelo
- Department of Bioinformatics - BiGCaT, Maastricht University, 6229 ER Maastricht, Netherlands
| | - Ruben Kok
- Dutch Techcentre for Life Sciences (Foundation office), Catharijnesingel 54, 3511 GC Utrecht, Netherlands
| | - Barend Mons
- Dutch Techcentre for Life Sciences (Foundation office), Catharijnesingel 54, 3511 GC Utrecht, Netherlands; Netherlands eScience Center, Science Park 140, 1098 XG Amsterdam, Netherlands; Leiden University Medical Center, Albinusdreef 2, 2333 ZA, Leiden, Netherlands
| | - Rob Hooft
- Dutch Techcentre for Life Sciences (Foundation office), Catharijnesingel 54, 3511 GC Utrecht, Netherlands; Netherlands eScience Center, Science Park 140, 1098 XG Amsterdam, Netherlands
| | | |
Collapse
|
34
|
Yu Q, Ding Y, Song M, Song S, Liu J, Zhang B. Tracing database usage: Detecting main paths in database link networks. J Informetr 2015. [DOI: 10.1016/j.joi.2014.10.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
35
|
Rybinski M, Aldana-Montes J. Calculating semantic relatedness for biomedical use in a knowledge-poor environment. BMC Bioinformatics 2014; 15 Suppl 14:S2. [PMID: 25471751 PMCID: PMC4255738 DOI: 10.1186/1471-2105-15-s14-s2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Computing semantic relatedness between textual labels representing biological and medical concepts is a crucial task in many automated knowledge extraction and processing applications relevant to the biomedical domain, specifically due to the huge amount of new findings being published each year. Most methods benefit from making use of highly specific resources, thus reducing their usability in many real world scenarios that differ from the original assumptions. In this paper we present a simple resource-efficient method for calculating semantic relatedness in a knowledge-poor environment. The method obtains results comparable to state-of-the-art methods, while being more generic and flexible. The solution being presented here was designed to use only a relatively generic and small document corpus and its statistics, without referring to a previously defined knowledge base, thus it does not assume a 'closed' problem. Results We propose a method in which computation for two input texts is based on the idea of comparing the vocabulary associated with the best-fit documents related to those texts. As keyterm extraction is a costly process, it is done in a preprocessing step on a 'per-document' basis in order to limit the on-line processing. The actual computations are executed in a compact vector space, limited by the most informative extraction results. The method has been evaluated on five direct benchmarks by calculating correlation coefficients w.r.t. average human answers. It also has been used on Gene - Disease and Disease- Disease data pairs to highlight its potential use as a data analysis tool. Apart from comparisons with reported results, some interesting features of the method have been studied, i.e. the relationship between result quality, efficiency and applicable trimming threshold for size reduction. Experimental evaluation shows that the presented method obtains results that are comparable with current state of the art methods, even surpassing them on a majority of the benchmarks. Additionally, a possible usage scenario for the method is showcased with a real-world data experiment. Conclusions Our method improves flexibility of the existing methods without a notable loss of quality. It is a legitimate alternative to the costly construction of specialized knowledge-rich resources.
Collapse
|
36
|
Hettne KM, Dharuri H, Zhao J, Wolstencroft K, Belhajjame K, Soiland-Reyes S, Mina E, Thompson M, Cruickshank D, Verdes-Montenegro L, Garrido J, de Roure D, Corcho O, Klyne G, van Schouwen R, ‘t Hoen PAC, Bechhofer S, Goble C, Roos M. Structuring research methods and data with the research object model: genomics workflows as a case study. J Biomed Semantics 2014; 5:41. [PMID: 25276335 PMCID: PMC4177597 DOI: 10.1186/2041-1480-5-41] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 07/29/2014] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. RESULTS We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". CONCLUSIONS Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. AVAILABILITY The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.
Collapse
Affiliation(s)
- Kristina M Hettne
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Harish Dharuri
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Jun Zhao
- />Department of Zoology, University of Oxford, Oxford, UK
| | - Katherine Wolstencroft
- />School of Computer Science, University of Manchester, Manchester, UK
- />Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | - Khalid Belhajjame
- />School of Computer Science, University of Manchester, Manchester, UK
| | | | - Eleni Mina
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Mark Thompson
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | | | | | | | - David de Roure
- />Department of Zoology, University of Oxford, Oxford, UK
| | - Oscar Corcho
- />Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
| | - Graham Klyne
- />Department of Zoology, University of Oxford, Oxford, UK
| | - Reinout van Schouwen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter A C ‘t Hoen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Sean Bechhofer
- />School of Computer Science, University of Manchester, Manchester, UK
| | - Carole Goble
- />School of Computer Science, University of Manchester, Manchester, UK
| | - Marco Roos
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
37
|
Good BM, Ainscough BJ, McMichael JF, Su AI, Griffith OL. Organizing knowledge to enable personalization of medicine in cancer. Genome Biol 2014; 15:438. [PMID: 25222080 PMCID: PMC4281950 DOI: 10.1186/s13059-014-0438-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Interpretation of the clinical significance of genomic alterations remains the most severe bottleneck preventing the realization of personalized medicine in cancer. We propose a knowledge commons to facilitate collaborative contributions and open discussion of clinical decision-making based on genomic events in cancer.
Collapse
|
38
|
Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. J Biomed Semantics 2014; 5:28. [PMID: 26261718 PMCID: PMC4530550 DOI: 10.1186/2041-1480-5-28] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 06/16/2014] [Indexed: 11/10/2022] Open
Abstract
Background Scientific publications are documentary representations of defeasible arguments, supported by data and repeatable methods. They are the essential mediating artifacts in the ecosystem of scientific communications. The institutional “goal” of science is publishing results. The linear document publication format, dating from 1665, has survived transition to the Web. Intractable publication volumes; the difficulty of verifying evidence; and observed problems in evidence and citation chains suggest a need for a web-friendly and machine-tractable model of scientific publications. This model should support: digital summarization, evidence examination, challenge, verification and remix, and incremental adoption. Such a model must be capable of expressing a broad spectrum of representational complexity, ranging from minimal to maximal forms. Results The micropublications semantic model of scientific argument and evidence provides these features. Micropublications support natural language statements; data; methods and materials specifications; discussion and commentary; challenge and disagreement; as well as allowing many kinds of statement formalization. The minimal form of a micropublication is a statement with its attribution. The maximal form is a statement with its complete supporting argument, consisting of all relevant evidence, interpretations, discussion and challenges brought forward in support of or opposition to it. Micropublications may be formalized and serialized in multiple ways, including in RDF. They may be added to publications as stand-off metadata. An OWL 2 vocabulary for micropublications is available at http://purl.org/mp. A discussion of this vocabulary along with RDF examples from the case studies, appears as OWL Vocabulary and RDF Examples in Additional file
1. Conclusion Micropublications, because they model evidence and allow qualified, nuanced assertions, can play essential roles in the scientific communications ecosystem in places where simpler, formalized and purely statement-based models, such as the nanopublications model, will not be sufficient. At the same time they will add significant value to, and are intentionally compatible with, statement-based formalizations. We suggest that micropublications, generated by useful software tools supporting such activities as writing, editing, reviewing, and discussion, will be of great value in improving the quality and tractability of biomedical communications.
Collapse
|
39
|
Ding Y, Zhang G, Chambers T, Song M, Wang X, Zhai C. Content-based citation analysis: The next generation of citation analysis. J Assoc Inf Sci Technol 2014. [DOI: 10.1002/asi.23256] [Citation(s) in RCA: 130] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Ying Ding
- Department of Information & Library Science; School of Informatics & Computing; Indiana University; 1320 E. 10th St., LI 011 Bloomington IN 47405-3907
| | - Guo Zhang
- Department of Information & Library Science; School of Informatics & Computing; Indiana University; 1320 E. 10th St., LI 011 Bloomington IN 47405-3907
| | - Tamy Chambers
- Department of Information & Library Science; School of Informatics & Computing; Indiana University; 1320 E. 10th St., LI 011 Bloomington IN 47405-3907
| | - Min Song
- Department of Library and Information Science, College of Liberal Arts; Yonsei University; 50 Yonsei-Ro, Seodaemun-Gu Seoul 120-749 South Korea
| | - Xiaolong Wang
- Department of Computer Science; College of Engineering; University of Illinois; 201 North Goodwin Avenue Urbana IL 61801-2302
| | - Chengxiang Zhai
- Department of Computer Science; College of Engineering; University of Illinois; 201 North Goodwin Avenue Urbana IL 61801-2302
| |
Collapse
|
40
|
Zdrazil B, Chichester C, Zander Balderud L, Engkvist O, Gaulton A, Overington JP. Transporter assays and assay ontologies: useful tools for drug discovery. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014; 12:e47-e54. [PMID: 25027375 DOI: 10.1016/j.ddtec.2014.03.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Transport proteins represent an eminent class of drug targets and ADMET (absorption, distribution, metabolism, excretion, toxicity) associated genes. There exists a large number of distinct activity assays for transport proteins, depending on not only the measurement needed (e.g. transport activity, strength of ligand–protein interaction), but also due to heterogeneous assay setups used by different research groups. Efforts to systematically organize this (divergent) bioassay data have large potential impact in Public-Private partnership and conventional commercial drug discovery. In this short review, we highlight some of the frequently used high-throughput assays for transport proteins, and we discuss emerging assay ontologies and their application to this field. Focusing on human P-glycoprotein (Multidrug resistance protein 1; gene name: ABCB1, MDR1), we exemplify how annotation of bioassay data per target class could improve and add to existing ontologies, and we propose to include an additional layer of metadata supporting data fusion across different bioassays.
Collapse
Affiliation(s)
- Barbara Zdrazil
- University of Vienna, Division of Drug Design and Medicinal Chemistry, Department of Pharmaceutical Chemistry, Pharmacoinformatics Research Group, Althanstrasse 14, A-1090 Vienna, Austria
| | - Christine Chichester
- Swiss Institute of Bioinformatics, CALIPHO Group, CMU - Rue Michel-Servet 1, 1211 Geneva 4, Switzerland
| | | | - Ola Engkvist
- Discovery Sciences, Chemistry Innovation Center, AstraZeneca R&D, Mölndal, Sweden
| | - Anna Gaulton
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - John P Overington
- European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
41
|
Belter CW. Measuring the value of research data: a citation analysis of oceanographic data sets. PLoS One 2014; 9:e92590. [PMID: 24671177 PMCID: PMC3966791 DOI: 10.1371/journal.pone.0092590] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 02/25/2014] [Indexed: 11/24/2022] Open
Abstract
Evaluation of scientific research is becoming increasingly reliant on publication-based bibliometric indicators, which may result in the devaluation of other scientific activities--such as data curation--that do not necessarily result in the production of scientific publications. This issue may undermine the movement to openly share and cite data sets in scientific publications because researchers are unlikely to devote the effort necessary to curate their research data if they are unlikely to receive credit for doing so. This analysis attempts to demonstrate the bibliometric impact of properly curated and openly accessible data sets by attempting to generate citation counts for three data sets archived at the National Oceanographic Data Center. My findings suggest that all three data sets are highly cited, with estimated citation counts in most cases higher than 99% of all the journal articles published in Oceanography during the same years. I also find that methods of citing and referring to these data sets in scientific publications are highly inconsistent, despite the fact that a formal citation format is suggested for each data set. These findings have important implications for developing a data citation format, encouraging researchers to properly curate their research data, and evaluating the bibliometric impact of individuals and institutions.
Collapse
Affiliation(s)
- Christopher W. Belter
- LAC Group, Central Library, National Oceanic and Atmospheric Administration, Silver Spring, Maryland, United States of America
| |
Collapse
|
42
|
Dumontier M, Baker CJ, Baran J, Callahan A, Chepelev L, Cruz-Toledo J, Del Rio NR, Duck G, Furlong LI, Keath N, Klassen D, McCusker JP, Queralt-Rosinach N, Samwald M, Villanueva-Rosales N, Wilkinson MD, Hoehndorf R. The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. J Biomed Semantics 2014; 5:14. [PMID: 24602174 PMCID: PMC4015691 DOI: 10.1186/2041-1480-5-14] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 02/02/2014] [Indexed: 11/10/2022] Open
Abstract
The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org.
Collapse
Affiliation(s)
- Michel Dumontier
- Center for Biomedical Informatics Research, Stanford University, Stanford, California, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Abstract
Purpose
– The aim of this paper is to reposition the research library in the context of the changing information and knowledge architecture at the end of the “Gutenberg Parenthesis” and as part of the rapidly emerging “semantic” environment of the Linked Open Data paradigm. Understanding this process requires a good understanding of the evolution of the “document” notion in the passage from print based culture to the distributed hypertextual and RDF based information architecture of the WWW.
Design/methodology/approach
– These objectives are reached using literature study and a descriptive historical approach as well as text mining techniques using Google nGrams as a data source.
Findings
– The paper presents a proposal for effectively repositioning research libraries in the context of eScience and eScholarship as well as clear indications of the proposed repositioning already taking place. Furthermore, a new perspective of the “document” notion is provided.
Practical implications
– The evolution described in the contribution creates opportunities for libraries to reposition themselves as aggregators and selectors of content and as contextualising agents as part of future Linked Data based scholarly research environments provided they are able and ready to operate the related cultural changes.
Originality/value
– The paper will be useful for practitioners in search of strategic guidance for repositioning their librarian institutions in a context of ever increasing competition for scarce funding resources.
Collapse
|
44
|
Jimeno Yepes A, Verspoor K. Literature mining of genetic variants for curation: quantifying the importance of supplementary material. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau003. [PMID: 24520105 PMCID: PMC3920087 DOI: 10.1093/database/bau003] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously considered by text mining tools. Although it is anecdotally known that supplementary material contains 'all of the information', and some researchers have speculated about the role of supplementary material (Schenck et al. Extraction of genetic mutations associated with cancer from public literature. J Health Med Inform 2012;S2:2.), our analysis substantiates the significant extent to which this material is critical. Our results highlight the need for literature mining tools to consider not only the narrative content of a publication but also the full set of material related to a publication.
Collapse
Affiliation(s)
- Antonio Jimeno Yepes
- National ICT Australia, Victoria Research Laboratory, Melbourne, Australia and Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | | |
Collapse
|
45
|
Patterson DJ, Egloff W, Agosti D, Eades D, Franz N, Hagedorn G, Rees JA, Remsen DP. Scientific names of organisms: attribution, rights, and licensing. BMC Res Notes 2014; 7:79. [PMID: 24495358 PMCID: PMC3922623 DOI: 10.1186/1756-0500-7-79] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Accepted: 01/28/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND As biological disciplines extend into the 'big data' world, they will need a names-based infrastructure to index and interconnect distributed data. The infrastructure must have access to all names of all organisms if it is to manage all information. Those who compile lists of species hold different views as to the intellectual property rights that apply to the lists. This creates uncertainty that impedes the development of a much-needed infrastructure for sharing biological data in the digital world. FINDINGS The laws in the United States of America and European Union are consistent with the position that scientific names of organisms and their compilation in checklists, classifications or taxonomic revisions are not subject to copyright. Compilations of names, such as classifications or checklists, are not creative in the sense of copyright law. Many content providers desire credit for their efforts. CONCLUSIONS A 'blue list' identifies elements of checklists, classifications and monographs to which intellectual property rights do not apply. To promote sharing, authors of taxonomic content, compilers, intermediaries, and aggregators should receive citable recognition for their contributions, with the greatest recognition being given to the originating authors. Mechanisms for achieving this are discussed.
Collapse
Affiliation(s)
- David J Patterson
- School of Life Sciences, Arizona State University, Tempe, Arizona 85287, USA.
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Shabo A. Towards a translational health information language. EPMA J 2014. [PMCID: PMC4125964 DOI: 10.1186/1878-5085-5-s1-a51] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
47
|
Elliott JH, Turner T, Clavisi O, Thomas J, Higgins JPT, Mavergames C, Gruen RL. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med 2014; 11:e1001603. [PMID: 24558353 PMCID: PMC3928029 DOI: 10.1371/journal.pmed.1001603] [Citation(s) in RCA: 307] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The current difficulties in keeping systematic reviews up to date leads to considerable inaccuracy, hampering the translation of knowledge into action. Incremental advances in conventional review updating are unlikely to lead to substantial improvements in review currency. A new approach is needed. We propose living systematic review as a contribution to evidence synthesis that combines currency with rigour to enhance the accuracy and utility of health evidence. Living systematic reviews are high quality, up-to-date online summaries of health research, updated as new research becomes available, and enabled by improved production efficiency and adherence to the norms of scholarly communication. Together with innovations in primary research reporting and the creation and use of evidence in health systems, living systematic review contributes to an emerging evidence ecosystem.
Collapse
Affiliation(s)
- Julian H. Elliott
- Department of Infectious Diseases, Alfred Hospital and Monash University, Melbourne, Australia
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
- * E-mail:
| | - Tari Turner
- School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia
- World Vision Australia, Melbourne, Australia
| | - Ornella Clavisi
- National Trauma Research Institute, Alfred Hospital, Melbourne, Australia
| | - James Thomas
- EPPI-Centre, Institute of Education, University of London, London, England
| | - Julian P. T. Higgins
- School of Social and Community Medicine, University of Bristol, Bristol, England
- Centre for Reviews and Dissemination, University of York, York, England
| | - Chris Mavergames
- Informatics and Knowledge Management Department, The Cochrane Collaboration, Freiburg, Germany
| | - Russell L. Gruen
- National Trauma Research Institute, Alfred Hospital, Melbourne, Australia
- Department of Surgery, Monash University, Melbourne, Australia
| |
Collapse
|
48
|
Qin H, Davis L, Mayernik M, Lankao PR, D'Ignazio J, Alston P. Variables As Currency: Linking Meta-Analysis Research and Data Paths in Sciences. DATA SCIENCE JOURNAL 2014. [DOI: 10.2481/dsj.14-030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
49
|
Livingston KM, Bada M, Hunter LE, Verspoor K. Representing annotation compositionality and provenance for the Semantic Web. J Biomed Semantics 2013; 4:38. [PMID: 24268021 PMCID: PMC4129183 DOI: 10.1186/2041-1480-4-38] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 09/20/2013] [Indexed: 12/03/2022] Open
Abstract
Background Though the annotation of digital artifacts with metadata has a long history, the bulk of that work focuses on the association of single terms or concepts to single targets. As annotation efforts expand to capture more complex information, annotations will need to be able to refer to knowledge structures formally defined in terms of more atomic knowledge structures. Existing provenance efforts in the Semantic Web domain primarily focus on tracking provenance at the level of whole triples and do not provide enough detail to track how individual triple elements of annotations were derived from triple elements of other annotations. Results We present a task- and domain-independent ontological model for capturing annotations and their linkage to their denoted knowledge representations, which can be singular concepts or more complex sets of assertions. We have implemented this model as an extension of the Information Artifact Ontology in OWL and made it freely available, and we show how it can be integrated with several prominent annotation and provenance models. We present several application areas for the model, ranging from linguistic annotation of text to the annotation of disease-associations in genome sequences. Conclusions With this model, progressively more complex annotations can be composed from other annotations, and the provenance of compositional annotations can be represented at the annotation level or at the level of individual elements of the RDF triples composing the annotations. This in turn allows for progressively richer annotations to be constructed from previous annotation efforts, the precise provenance recording of which facilitates evidence-based inference and error tracking.
Collapse
Affiliation(s)
- Kevin M Livingston
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Michael Bada
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Lawrence E Hunter
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Karin Verspoor
- National ICT Australia, Victoria Research Laboratory, Melbourne, VIC, 3010, Australia ; Department of Computing and Information Systems, The University of Melbourne, Melbourne 3010 VIC, Australia
| |
Collapse
|
50
|
Rebholz-Schuhmann D, Grabmüller C, Kavaliauskas S, Croset S, Woollard P, Backofen R, Filsell W, Clark D. A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources. Drug Discov Today 2013; 19:882-9. [PMID: 24201223 DOI: 10.1016/j.drudis.2013.10.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Revised: 09/24/2013] [Accepted: 10/28/2013] [Indexed: 10/26/2022]
Abstract
In the Semantic Enrichment of the Scientific Literature (SESL) project, researchers from academia and from life science and publishing companies collaborated in a pre-competitive way to integrate and share information for type 2 diabetes mellitus (T2DM) in adults. This case study exposes benefits from semantic interoperability after integrating the scientific literature with biomedical data resources, such as UniProt Knowledgebase (UniProtKB) and the Gene Expression Atlas (GXA). We annotated scientific documents in a standardized way, by applying public terminological resources for diseases and proteins, and other text-mining approaches. Eventually, we compared the genetic causes of T2DM across the data resources to demonstrate the benefits from the SESL triple store. Our solution enables publishers to distribute their content with little overhead into remote data infrastructures, such as into any Virtual Knowledge Broker.
Collapse
Affiliation(s)
- Dietrich Rebholz-Schuhmann
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK; Computerlinguistik, Universität Zürich, Binzmühlestrasse 14, 8050 Zürich, Switzerland.
| | - Christoph Grabmüller
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Silvestras Kavaliauskas
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Samuel Croset
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter Woollard
- GlaxoSmithKline, GlaxoSmithKline Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, UK
| | - Rolf Backofen
- Albert-Ludwigs-University Freiburg, Fahnenbergplatz, D-79085 Freiburg, Germany
| | - Wendy Filsell
- Unilever R&D, Colworth Science Park, Sharnbrook MK44 1LQ, UK
| | - Dominic Clark
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|