Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Mons B, van Haagen H, Chichester C, Hoen P', den Dunnen JT, van Ommen G, van Mulligen E, Singh B, Hooft R, Roos M, Hammond J, Kiesel B, Giardine B, Velterop J, Groth P, Schultes E. The value of data. Nat Genet 2011;43:281-3. [DOI: 10.1038/ng0411-281] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

For:	Mons B, van Haagen H, Chichester C, Hoen P', den Dunnen JT, van Ommen G, van Mulligen E, Singh B, Hooft R, Roos M, Hammond J, Kiesel B, Giardine B, Velterop J, Groth P, Schultes E. The value of data. Nat Genet 2011;43:281-3. [DOI: 10.1038/ng0411-281] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Number

Cited by Other Article(s)

Rossini M, Montanaro G, Montreuil O, Tarasov S. Towards computable taxonomic knowledge: Leveraging nanopublications for sharing new synonyms in the Madagascan genus Helictopleurus (Coleoptera, Scarabaeinae). Biodivers Data J 2024;12:e120304. [PMID: 38912110 PMCID: PMC11193050 DOI: 10.3897/bdj.12.e120304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 05/14/2024] [Indexed: 06/25/2024] Open

Morley J, Hamilton N, Floridi L. Selling NHS patient data. BMJ 2024;384:q420. [PMID: 38387965 DOI: 10.1136/bmj.q420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/24/2024]

Schultes E, Roos M, Bonino da Silva Santos LO, Guizzardi G, Bouwman J, Hankemeier T, Baak A, Mons B. FAIR Digital Twins for Data-Intensive Research. Front Big Data 2022;5:883341. [PMID: 35647536 PMCID: PMC9130601 DOI: 10.3389/fdata.2022.883341] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 04/12/2022] [Indexed: 11/13/2022] Open

Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements. JOURNAL OF DATA AND INFORMATION SCIENCE 2022. [DOI: 10.2478/jdis-2022-0008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Abstract Abstract Purpose Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements. Design/methodology/approach Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels. Findings The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research. Research limitations The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE. Practical implications Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled. Originality/value We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements. Collapse

Kuhn T, Taelman R, Emonet V, Antonatos H, Soiland-Reyes S, Dumontier M. Semantic micro-contributions with decentralized nanopublication services. PeerJ Comput Sci 2021;7:e387. [PMID: 33817033 PMCID: PMC7959648 DOI: 10.7717/peerj-cs.387] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 01/19/2021] [Indexed: 06/12/2023]

Giachelle F, Dosso D, Silvello G. Search, access, and explore life science nanopublications on the Web. PeerJ Comput Sci 2021;7:e335. [PMID: 33816986 PMCID: PMC7959622 DOI: 10.7717/peerj-cs.335] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 11/20/2020] [Indexed: 06/12/2023]

Li X, Rousseau JF, Ding Y, Song M, Lu W. Understanding Drug Repurposing From the Perspective of Biomedical Entities and Their Evolution: Bibliographic Research Using Aspirin. JMIR Med Inform 2020;8:e16739. [PMID: 32543442 PMCID: PMC7327595 DOI: 10.2196/16739] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 01/08/2020] [Accepted: 03/31/2020] [Indexed: 12/26/2022] Open

Abstract

BACKGROUND

Drug development is still a costly and time-consuming process with a low rate of success. Drug repurposing (DR) has attracted significant attention because of its significant advantages over traditional approaches in terms of development time, cost, and safety. Entitymetrics, defined as bibliometric indicators based on biomedical entities (eg, diseases, drugs, and genes) studied in the biomedical literature, make it possible for researchers to measure knowledge evolution and the transfer of drug research.

OBJECTIVE

The purpose of this study was to understand DR from the perspective of biomedical entities (diseases, drugs, and genes) and their evolution.

METHODS

In the work reported in this paper, we extended the bibliometric indicators of biomedical entities mentioned in PubMed to detect potential patterns of biomedical entities in various phases of drug research and investigate the factors driving DR. We used aspirin (acetylsalicylic acid) as the subject of the study since it can be repurposed for many applications. We propose 4 easy, transparent measures based on entitymetrics to investigate DR for aspirin: Popularity Index (P₁), Promising Index (P₂), Prestige Index (P₃), and Collaboration Index (CI).

RESULTS

We found that the maxima of P₁, P₃, and CI are closely associated with the different repurposing phases of aspirin. These metrics enabled us to observe the way in which biomedical entities interacted with the drug during the various phases of DR and to analyze the potential driving factors for DR at the entity level. P₁ and CI were indicative of the dynamic trends of a specific biomedical entity over a long time period, while P₂ was more sensitive to immediate changes. P₃ reflected the early signs of the practical value of biomedical entities and could be valuable for tracking the research frontiers of a drug.

CONCLUSIONS

In-depth studies of side effects and mechanisms, fierce market competition, and advanced life science technologies are driving factors for DR. This study showcases the way in which researchers can examine the evolution of DR using entitymetrics, an approach that can be valuable for enhancing decision making in the field of drug discovery and development.

Collapse

Sustkova HP, Hettne KM, Wittenburg P, Jacobsen A, Kuhn T, Pergl R, Slifka J, McQuilton P, Magagna B, Sansone SA, Stocker M, Imming M, Lannom L, Musen M, Schultes E. FAIR Convergence Matrix: Optimizing the Reuse of Existing FAIR-Related Resources. DATA INTELLIGENCE 2020. [DOI: 10.1162/dint_a_00038] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Slenter DN, Kutmon M, Hanspers K, Riutta A, Windsor J, Nunes N, Mélius J, Cirillo E, Coort SL, Digles D, Ehrhart F, Giesbertz P, Kalafati M, Martens M, Miller R, Nishida K, Rieswijk L, Waagmeester A, Eijssen LMT, Evelo CT, Pico AR, Willighagen EL. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res 2019;46:D661-D667. [PMID: 29136241 PMCID: PMC5753270 DOI: 10.1093/nar/gkx1064] [Citation(s) in RCA: 590] [Impact Index Per Article: 118.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 10/25/2017] [Indexed: 02/06/2023] Open

Affiliation(s)

Denise N Slenter Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
Martina Kutmon Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 ER Maastricht, The Netherlands
Kristina Hanspers Gladstone Institutes, San Francisco, California, CA 94158, USA
Anders Riutta Gladstone Institutes, San Francisco, California, CA 94158, USA
Jacob Windsor Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
Nuno Nunes Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
Jonathan Mélius Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
Elisa Cirillo Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
Susan L Coort Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
Daniela Digles University of Vienna, Department of Pharmaceutical Chemistry, 1090 Vienna, Austria
Friederike Ehrhart Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
Pieter Giesbertz Chair of Nutritional Physiology, Technische Universität München, 85350 Freising, Germany
Marianthi Kalafati Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 ER Maastricht, The Netherlands
Marvin Martens Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
Ryan Miller Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands
Kozo Nishida Laboratory for Biochemical Simulation, RIKEN Quantitative Biology Center, Suita, Osaka 565-0874, Japan
Linda Rieswijk Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, CA 94720, USA
Andra Waagmeester Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Micelio, Antwerp, Belgium
Lars M T Eijssen Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,School for Mental Health and Neuroscience, Department of Psychiatry and Neuropsychology, Maastricht University Medical Centre, 6229 ER Maastricht, The Netherlands
Chris T Evelo Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands.,Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 ER Maastricht, The Netherlands
Alexander R Pico Gladstone Institutes, San Francisco, California, CA 94158, USA
Egon L Willighagen Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands

Collapse

Quantifying the impact of public omics data. Nat Commun 2019;10:3512. [PMID: 31383865 PMCID: PMC6683138 DOI: 10.1038/s41467-019-11461-w] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 07/12/2019] [Indexed: 11/25/2022] Open

Townend GS, Ehrhart F, van Kranen HJ, Wilkinson M, Jacobsen A, Roos M, Willighagen EL, van Enckevort D, Evelo CT, Curfs LMG. MECP2 variation in Rett syndrome-An overview of current coverage of genetic and phenotype data within existing databases. Hum Mutat 2018;39:914-924. [PMID: 29704307 PMCID: PMC6033003 DOI: 10.1002/humu.23542] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Revised: 04/18/2018] [Accepted: 04/23/2018] [Indexed: 12/30/2022]

Advancing food, nutrition, and health research in Europe by connecting and building research infrastructures in a DISH-RI: Results of the EuroDISH project. Trends Food Sci Technol 2018. [DOI: 10.1016/j.tifs.2017.12.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Guitton Y, Tremblay-Franco M, Le Corguillé G, Martin JF, Pétéra M, Roger-Mele P, Delabrière A, Goulitquer S, Monsoor M, Duperier C, Canlet C, Servien R, Tardivel P, Caron C, Giacomoni F, Thévenot EA. Create, run, share, publish, and reference your LC–MS, FIA–MS, GC–MS, and NMR data analysis workflows with the Workflow4Metabolomics 3.0 Galaxy online infrastructure for metabolomics. Int J Biochem Cell Biol 2017;93:89-101. [DOI: 10.1016/j.biocel.2017.07.002] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2017] [Revised: 06/14/2017] [Accepted: 07/10/2017] [Indexed: 12/11/2022]

Linked Registries: Connecting Rare Diseases Patient Registries through a Semantic Web Layer. BIOMED RESEARCH INTERNATIONAL 2017;2017:8327980. [PMID: 29214177 PMCID: PMC5682045 DOI: 10.1155/2017/8327980] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 06/11/2017] [Accepted: 10/02/2017] [Indexed: 12/28/2022]

McKiernan EC, Marrone DF. CA1 pyramidal cells have diverse biophysical properties, affected by development, experience, and aging. PeerJ 2017;5:e3836. [PMID: 28948109 PMCID: PMC5609525 DOI: 10.7717/peerj.3836] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 08/31/2017] [Indexed: 12/04/2022] Open

López-Massaguer O, Sanz F, Pastor M. An automated tool for obtaining QSAR-ready series of compounds using semantic web technologies. Bioinformatics 2017;34:131-133. [DOI: 10.1093/bioinformatics/btx566] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 09/06/2017] [Indexed: 11/13/2022] Open

Ding Y, Stirling K. Data-driven Discovery: A New Era of Exploiting the Literature and Data. JOURNAL OF DATA AND INFORMATION SCIENCE 2017. [DOI: 10.20309/jdis.201622] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract Abstract In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery. The Panama Canal, the 77-kilometer waterway connecting the Atlantic and Pacific oceans, has played a crucial role in international trade for more than a century. However, digging the Panama Canal was an exceedingly challenging process. A French effort in the late 19th century was abandoned because of equipment issues and a significant loss of labor due to tropical diseases transmitted by mosquitoes. The United States officially took control of the project in 1902. The United States replaced the unusable French equipment with new construction equipment that was designed for a much larger and faster scale of work. Colonel William C. Gorgas was appointed as the chief sanitation officer and charged with eliminating mosquito-spread illnesses. After overcoming these and additional trials and tribulations, the Canal successfully opened on August 15, 1914. The triumphant completion of the Panama Canal demonstrates that using the right tools and eliminating significant threats are critical steps in any project. More than 100 years later, a paradigm shift is occurring, as we move into a data-centered era. Today, data are extremely rich but overwhelming, and extracting information out of data requires not only the right tools and methods but also awareness of major threats. In this data-intensive era, the traditional method of exploring the related publications and available datasets from previous experiments to arrive at a testable hypothesis is becoming obsolete. Consider the fact that a new article is published every 30 seconds (Jinha, 2010). In fact, for the common disease of diabetes, there have been roughly 500,000 articles published to date; even if a scientist reads 20 papers per day, he will need 68 years to wade through all the material. The standard method simply cannot sufficiently deal with the large volume of documents or the exponential growth of datasets. A major threat is that the canon of domain knowledge cannot be consumed and held in human memory. Without efficient methods to process information and without a way to eliminate the fundamental threat of limited memory and time to handle the data deluge, we may find ourselves facing failure as the French did on the Isthmus of Panama more than a century ago. Scouring the literature and data to generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets (Evans & Foster, 2011). In reality, most scholars have never been able to keep completely up-to-date with publications and datasets considering the unending increase in quantity and diversity of research within their own areas of focus, let alone in related conceptual areas in which knowledge may be segregated by syntactically impenetrable keyword barriers or an entirely different research corpus. Research communities in many disciplines are finally recognizing that with advances in information technology there needs to be new ways to extract entities from increasingly data-intensive publications and to integrate and analyze large-scale datasets. This provides a compelling opportunity to improve the process of knowledge discovery from the literature and datasets through use of knowledge graphs and an associated framework that integrates scholars, domain knowledge, datasets, workflows, and machines on a scale previously beyond our reach (Ding et al., 2013). Collapse

Hassani-Pak K, Rawlings C. Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes. J Integr Bioinform 2017;14:/j/jib.ahead-of-print/jib-2016-0002/jib-2016-0002.xml. [PMID: 28609292 PMCID: PMC6042805 DOI: 10.1515/jib-2016-0002] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 02/16/2017] [Indexed: 02/06/2023] Open

Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017;117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Penev L, Georgiev T, Geshev P, Demirov S, Senderov V, Kuzmova I, Kostadinova I, Peneva S, Stoev P. ARPHA-BioDiv: A toolbox for scholarly publication and dissemination of biodiversity data based on the ARPHA Publishing Platform. RESEARCH IDEAS AND OUTCOMES 2017. [DOI: 10.3897/rio.3.e13088] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Goldmann D, Zdrazil B, Digles D, Ecker GF. Empowering pharmacoinformatics by linked life science data. J Comput Aided Mol Des 2017;31:319-328. [PMID: 27830428 PMCID: PMC5385323 DOI: 10.1007/s10822-016-9990-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 10/24/2016] [Indexed: 11/11/2022]

Tripathi S, Vercruysse S, Chawla K, Christie KR, Blake JA, Huntley RP, Orchard S, Hermjakob H, Thommesen L, Lægreid A, Kuiper M. Gene regulation knowledge commons: community action takes care of DNA binding transcription factors. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016;2016:baw088. [PMID: 27270715 PMCID: PMC4911790 DOI: 10.1093/database/baw088] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Accepted: 05/05/2016] [Indexed: 12/23/2022]

de Leeuw N, Dijkhuizen T, Hehir-Kwa JY, Carter NP, Feuk L, Firth HV, Kuhn RM, Ledbetter DH, Martin CL, van Ravenswaaij-Arts CMA, Scherer SW, Shams S, Van Vooren S, Sijmons R, Swertz M, Hastings R. Diagnostic interpretation of array data using public databases and internet sources. Hum Mutat 2016;33:930-40. [PMID: 26285306 DOI: 10.1002/humu.22049] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Hettne KM, Thompson M, van Haagen HHHBM, van der Horst E, Kaliyaperumal R, Mina E, Tatum Z, Laros JFJ, van Mulligen EM, Schuemie M, Aten E, Li TS, Bruskiewich R, Good BM, Su AI, Kors JA, den Dunnen J, van Ommen GJB, Roos M, ‘t Hoen PA, Mons B, Schultes EA. The Implicitome: A Resource for Rationalizing Gene-Disease Associations. PLoS One 2016;11:e0149621. [PMID: 26919047 PMCID: PMC4769089 DOI: 10.1371/journal.pone.0149621] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 02/03/2016] [Indexed: 11/19/2022] Open

Affiliation(s)

Kristina M. Hettne Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands * E-mail:
Mark Thompson Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Herman H. H. B. M. van Haagen Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Eelke van der Horst Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Rajaram Kaliyaperumal Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Eleni Mina Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Zuotian Tatum Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Jeroen F. J. Laros Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Erik M. van Mulligen Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
Martijn Schuemie Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
Emmelien Aten Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Tong Shu Li Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
Richard Bruskiewich STAR Informatics / Delphinai Corporation, Port Moody, BC, Canada
Benjamin M. Good Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
Andrew I. Su Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
Jan A. Kors Department of Medical Informatics, Erasmus University Medical Center Rotterdam, Rotterdam, The Netherlands
Johan den Dunnen Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Gert-Jan B. van Ommen Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Marco Roos Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Peter A.C. ‘t Hoen Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Barend Mons Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands Dutch Techcentre for Life Sciences, Utrecht, The Netherlands
Erik A. Schultes Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands Leiden Institute for Advanced Computer Science, Leiden, The Netherlands

Collapse

Dyke SOM, Philippakis AA, Rambla De Argila J, Paltoo DN, Luetkemeier ES, Knoppers BM, Brookes AJ, Spalding JD, Thompson M, Roos M, Boycott KM, Brudno M, Hurles M, Rehm HL, Matern A, Fiume M, Sherry ST. Consent Codes: Upholding Standard Data Use Conditions. PLoS Genet 2016;12:e1005772. [PMID: 26796797 PMCID: PMC4721915 DOI: 10.1371/journal.pgen.1005772] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Affiliation(s)

Stephanie O. M. Dyke Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, Quebec, Canada * E-mail:
Anthony A. Philippakis Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
Jordi Rambla De Argila Centre for Genomic Regulation (CRG), Barcelona, Spain Universitat Pompeu Fabra (UPF), Barcelona, Spain
Dina N. Paltoo Office of Science Policy, Office of the Director, National Institutes of Health, Bethesda, Maryland, United States of America
Erin S. Luetkemeier Office of Science Policy, Office of the Director, National Institutes of Health, Bethesda, Maryland, United States of America
Bartha M. Knoppers Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, Quebec, Canada
Anthony J. Brookes Department of Genetics, University of Leicester, Leicester, United Kingdom
J. Dylan Spalding European Molecular Biology Laboratory—European Bioinformatics Institute (EMBL—EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
Mark Thompson Human Genetics Department, Leiden University Medical Center, Leiden, The Netherlands
Marco Roos Human Genetics Department, Leiden University Medical Center, Leiden, The Netherlands
Kym M. Boycott Children’s Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, Ontario, Canada
Michael Brudno Centre for Computational Medicine, Hospital for Sick Children, Toronto, Ontario, Canada Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
Matthew Hurles Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
Heidi L. Rehm Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America Department of Pathology, Brigham & Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
Andreas Matern Bioreference Laboratories, Inc., Elmwood Park, New Jersey, United States of America
Marc Fiume DNAstack, Toronto, Ontario, Canada
Stephen T. Sherry National Centre for Biotechnology Information, US National Library of Medicine, Bethesda, Maryland, United States of America

Collapse

Womack RP. Research Data in Core Journals in Biology, Chemistry, Mathematics, and Physics. PLoS One 2015;10:e0143460. [PMID: 26636676 PMCID: PMC4670119 DOI: 10.1371/journal.pone.0143460] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2015] [Accepted: 11/04/2015] [Indexed: 11/19/2022] Open

van Dam JCJ, Koehorst JJ, Schaap PJ, Martins dos Santos VAP, Suarez-Diez M. RDF2Graph a tool to recover, understand and validate the ontology of an RDF resource. J Biomed Semantics 2015;6:39. [PMID: 26500754 PMCID: PMC4619317 DOI: 10.1186/s13326-015-0038-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2015] [Accepted: 09/23/2015] [Indexed: 11/10/2022] Open

Lopes P, Oliveira JL. An automated real-time integration and interoperability framework for bioinformatics. BMC Bioinformatics 2015;16:328. [PMID: 26464306 PMCID: PMC4603302 DOI: 10.1186/s12859-015-0761-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2015] [Accepted: 10/06/2015] [Indexed: 11/29/2022] Open

Read KB, Sheehan JR, Huerta MF, Knecht LS, Mork JG, Humphreys BL. Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study. PLoS One 2015. [PMID: 26207759 PMCID: PMC4514623 DOI: 10.1371/journal.pone.0132735] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open

Abstract

Objective

This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository.

Methods

We analyzed NIH-funded journal articles that were published in 2011, cited in PubMed and deposited in PubMed Central (PMC) to identify those that indicate data were submitted to a known repository. After excluding those articles, we analyzed a random sample of the remaining articles to estimate how many and what types of invisible datasets were used in each article.

Results

About 12% of the articles explicitly mention deposition of datasets in recognized repositories, leaving 88% that are invisible datasets. Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011. Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects.

Conclusion

In addition to providing a rough estimate of the total number of datasets produced per year by NIH-funded researchers, this study identifies additional issues that must be addressed to improve the discoverability of and access to biomedical research data: the definition of a “dataset,” determination of which (if any) data are valuable for archiving and preservation, and better methods for estimating the number of datasets of interest. Lack of consensus amongst annotators about the number of datasets in a given article reinforces the need for a principled way of thinking about how to identify and characterize biomedical datasets.

Collapse

González-Beltrán A, Li P, Zhao J, Avila-Garcia MS, Roos M, Thompson M, van der Horst E, Kaliyaperumal R, Luo R, Lee TL, Lam TW, Edmunds SC, Sansone SA, Rocca-Serra P. From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics. PLoS One 2015;10:e0127612. [PMID: 26154165 PMCID: PMC4495984 DOI: 10.1371/journal.pone.0127612] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 04/16/2015] [Indexed: 12/20/2022] Open

Abstract

MOTIVATION

Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed. The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler.

RESULTS

Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings. The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata.

AVAILABILITY

SOAPdenovo2 scripts, data, and results are available through the GigaScience Database: http://dx.doi.org/10.5524/100044; the workflows are available from GigaGalaxy: http://galaxy.cbiit.cuhk.edu.hk; and the representations using the ISA, NP, and RO models are available through the SOAPdenovo2 case study website http://isa-tools.github.io/soapdenovo2/.

CONTACT

philippe.rocca-serra@oerc.ox.ac.uk and susanna-assunta.sansone@oerc.ox.ac.uk.

Collapse

Affiliation(s)

Alejandra González-Beltrán Oxford e-Research Centre, University of Oxford, 7 Keble Road, OX1 3QG, United Kingdom
Peter Li GigaScience, BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong, People’s Republic of China
Jun Zhao InfoLab21, Lancaster University, Bailrigg, Lancaster, LA1 4WA, United Kingdom
Maria Susana Avila-Garcia Nuffield Department of Medicine, Experimental Medicine Division, John Radcliffe Hospital, Headley Way, Headington, Oxford, OX3 9DU, United Kingdom
Marco Roos Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
Mark Thompson Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
Eelke van der Horst Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
Rajaram Kaliyaperumal Department of Human Genetics, Leiden University Medical Center, P.O. Box 9600, 2300 RC Leiden, The Netherlands
Ruibang Luo HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong, People’s Republic of China
Tin-Lap Lee School of Biomedical Sciences and CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, Hong Kong, People’s Republic of China
Tak-wah Lam HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong, People’s Republic of China
Scott C. Edmunds GigaScience, BGI HK Research Institute, 16 Dai Fu Street, Tai Po Industrial Estate, Hong Kong, People’s Republic of China
Susanna-Assunta Sansone Oxford e-Research Centre, University of Oxford, 7 Keble Road, OX1 3QG, United Kingdom
Philippe Rocca-Serra Oxford e-Research Centre, University of Oxford, 7 Keble Road, OX1 3QG, United Kingdom

Collapse

Uddin S, Khan A, Baur LA. A framework to explore the knowledge structure of multidisciplinary research fields. PLoS One 2015;10:e0123537. [PMID: 25915521 PMCID: PMC4410998 DOI: 10.1371/journal.pone.0123537] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2014] [Accepted: 03/04/2015] [Indexed: 01/08/2023] Open

Bravo À, Piñero J, Queralt-Rosinach N, Rautschka M, Furlong LI. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinformatics 2015;16:55. [PMID: 25886734 PMCID: PMC4466840 DOI: 10.1186/s12859-015-0472-9] [Citation(s) in RCA: 116] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 01/19/2015] [Indexed: 11/23/2022] Open

Abstract

Background

Current biomedical research needs to leverage and exploit the large amount of information reported in scientific publications. Automated text mining approaches, in particular those aimed at finding relationships between entities, are key for identification of actionable knowledge from free text repositories. We present the BeFree system aimed at identifying relationships between biomedical entities with a special focus on genes and their associated diseases.

Results

By exploiting morpho-syntactic information of the text, BeFree is able to identify gene-disease, drug-disease and drug-target associations with state-of-the-art performance. The application of BeFree to real-case scenarios shows its effectiveness in extracting information relevant for translational research. We show the value of the gene-disease associations extracted by BeFree through a number of analyses and integration with other data sources. BeFree succeeds in identifying genes associated to a major cause of morbidity worldwide, depression, which are not present in other public resources. Moreover, large-scale extraction and analysis of gene-disease associations, and integration with current biomedical knowledge, provided interesting insights on the kind of information that can be found in the literature, and raised challenges regarding data prioritization and curation. We found that only a small proportion of the gene-disease associations discovered by using BeFree is collected in expert-curated databases. Thus, there is a pressing need to find alternative strategies to manual curation, in order to review, prioritize and curate text-mining data and incorporate it into domain-specific databases. We present our strategy for data prioritization and discuss its implications for supporting biomedical research and applications.

Conclusions

BeFree is a novel text mining system that performs competitively for the identification of gene-disease, drug-disease and drug-target associations. Our analyses show that mining only a small fraction of MEDLINE results in a large dataset of gene-disease associations, and only a small proportion of this dataset is actually recorded in curated resources (2%), raising several issues on data prioritization and curation. We propose that joint analysis of text mined data with data curated by experts appears as a suitable approach to both assess data quality and highlight novel and interesting information.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0472-9) contains supplementary material, which is available to authorized users.

Collapse

Eijssen L, Evelo C, Kok R, Mons B, Hooft R. The Dutch Techcentre for Life Sciences: Enabling data-intensive life science research in the Netherlands. F1000Res 2015;4:33. [PMID: 26913186 PMCID: PMC4743138 DOI: 10.12688/f1000research.6009.2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2016] [Indexed: 11/20/2022] Open

Yu Q, Ding Y, Song M, Song S, Liu J, Zhang B. Tracing database usage: Detecting main paths in database link networks. J Informetr 2015. [DOI: 10.1016/j.joi.2014.10.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Rybinski M, Aldana-Montes J. Calculating semantic relatedness for biomedical use in a knowledge-poor environment. BMC Bioinformatics 2014;15 Suppl 14:S2. [PMID: 25471751 PMCID: PMC4255738 DOI: 10.1186/1471-2105-15-s14-s2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

Background

Computing semantic relatedness between textual labels representing biological and medical concepts is a crucial task in many automated knowledge extraction and processing applications relevant to the biomedical domain, specifically due to the huge amount of new findings being published each year. Most methods benefit from making use of highly specific resources, thus reducing their usability in many real world scenarios that differ from the original assumptions. In this paper we present a simple resource-efficient method for calculating semantic relatedness in a knowledge-poor environment. The method obtains results comparable to state-of-the-art methods, while being more generic and flexible. The solution being presented here was designed to use only a relatively generic and small document corpus and its statistics, without referring to a previously defined knowledge base, thus it does not assume a 'closed' problem.

Results

We propose a method in which computation for two input texts is based on the idea of comparing the vocabulary associated with the best-fit documents related to those texts. As keyterm extraction is a costly process, it is done in a preprocessing step on a 'per-document' basis in order to limit the on-line processing. The actual computations are executed in a compact vector space, limited by the most informative extraction results. The method has been evaluated on five direct benchmarks by calculating correlation coefficients w.r.t. average human answers. It also has been used on Gene - Disease and Disease- Disease data pairs to highlight its potential use as a data analysis tool. Apart from comparisons with reported results, some interesting features of the method have been studied, i.e. the relationship between result quality, efficiency and applicable trimming threshold for size reduction. Experimental evaluation shows that the presented method obtains results that are comparable with current state of the art methods, even surpassing them on a majority of the benchmarks. Additionally, a possible usage scenario for the method is showcased with a real-world data experiment.

Conclusions

Our method improves flexibility of the existing methods without a notable loss of quality. It is a legitimate alternative to the costly construction of specialized knowledge-rich resources.

Collapse

Hettne KM, Dharuri H, Zhao J, Wolstencroft K, Belhajjame K, Soiland-Reyes S, Mina E, Thompson M, Cruickshank D, Verdes-Montenegro L, Garrido J, de Roure D, Corcho O, Klyne G, van Schouwen R, ‘t Hoen PAC, Bechhofer S, Goble C, Roos M. Structuring research methods and data with the research object model: genomics workflows as a case study. J Biomed Semantics 2014;5:41. [PMID: 25276335 PMCID: PMC4177597 DOI: 10.1186/2041-1480-5-41] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 07/29/2014] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows.

RESULTS

We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?".

CONCLUSIONS

Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well.

AVAILABILITY

The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/ro.

Collapse

Good BM, Ainscough BJ, McMichael JF, Su AI, Griffith OL. Organizing knowledge to enable personalization of medicine in cancer. Genome Biol 2014;15:438. [PMID: 25222080 PMCID: PMC4281950 DOI: 10.1186/s13059-014-0438-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Micropublications: a semantic model for claims, evidence, arguments and annotations in biomedical communications. J Biomed Semantics 2014;5:28. [PMID: 26261718 PMCID: PMC4530550 DOI: 10.1186/2041-1480-5-28] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 06/16/2014] [Indexed: 11/10/2022] Open

Abstract

Background

Scientific publications are documentary representations of defeasible arguments, supported by data and repeatable methods. They are the essential mediating artifacts in the ecosystem of scientific communications. The institutional “goal” of science is publishing results. The linear document publication format, dating from 1665, has survived transition to the Web.

Intractable publication volumes; the difficulty of verifying evidence; and observed problems in evidence and citation chains suggest a need for a web-friendly and machine-tractable model of scientific publications. This model should support: digital summarization, evidence examination, challenge, verification and remix, and incremental adoption. Such a model must be capable of expressing a broad spectrum of representational complexity, ranging from minimal to maximal forms.

Results

The micropublications semantic model of scientific argument and evidence provides these features. Micropublications support natural language statements; data; methods and materials specifications; discussion and commentary; challenge and disagreement; as well as allowing many kinds of statement formalization.

The minimal form of a micropublication is a statement with its attribution. The maximal form is a statement with its complete supporting argument, consisting of all relevant evidence, interpretations, discussion and challenges brought forward in support of or opposition to it. Micropublications may be formalized and serialized in multiple ways, including in RDF. They may be added to publications as stand-off metadata.

An OWL 2 vocabulary for micropublications is available at http://purl.org/mp. A discussion of this vocabulary along with RDF examples from the case studies, appears as OWL Vocabulary and RDF Examples in Additional file 1.

Conclusion

Micropublications, because they model evidence and allow qualified, nuanced assertions, can play essential roles in the scientific communications ecosystem in places where simpler, formalized and purely statement-based models, such as the nanopublications model, will not be sufficient. At the same time they will add significant value to, and are intentionally compatible with, statement-based formalizations.

We suggest that micropublications, generated by useful software tools supporting such activities as writing, editing, reviewing, and discussion, will be of great value in improving the quality and tractability of biomedical communications.

Collapse

Ding Y, Zhang G, Chambers T, Song M, Wang X, Zhai C. Content-based citation analysis: The next generation of citation analysis. J Assoc Inf Sci Technol 2014. [DOI: 10.1002/asi.23256] [Citation(s) in RCA: 130] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Zdrazil B, Chichester C, Zander Balderud L, Engkvist O, Gaulton A, Overington JP. Transporter assays and assay ontologies: useful tools for drug discovery. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014;12:e47-e54. [PMID: 25027375 DOI: 10.1016/j.ddtec.2014.03.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Belter CW. Measuring the value of research data: a citation analysis of oceanographic data sets. PLoS One 2014;9:e92590. [PMID: 24671177 PMCID: PMC3966791 DOI: 10.1371/journal.pone.0092590] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 02/25/2014] [Indexed: 11/24/2022] Open

Dumontier M, Baker CJ, Baran J, Callahan A, Chepelev L, Cruz-Toledo J, Del Rio NR, Duck G, Furlong LI, Keath N, Klassen D, McCusker JP, Queralt-Rosinach N, Samwald M, Villanueva-Rosales N, Wilkinson MD, Hoehndorf R. The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery. J Biomed Semantics 2014;5:14. [PMID: 24602174 PMCID: PMC4015691 DOI: 10.1186/2041-1480-5-14] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 02/02/2014] [Indexed: 11/10/2022] Open

Gradmann S. From containers to content to context. JOURNAL OF DOCUMENTATION 2014. [DOI: 10.1108/jd-05-2013-0058] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Abstract Purpose – The aim of this paper is to reposition the research library in the context of the changing information and knowledge architecture at the end of the “Gutenberg Parenthesis” and as part of the rapidly emerging “semantic” environment of the Linked Open Data paradigm. Understanding this process requires a good understanding of the evolution of the “document” notion in the passage from print based culture to the distributed hypertextual and RDF based information architecture of the WWW. Design/methodology/approach – These objectives are reached using literature study and a descriptive historical approach as well as text mining techniques using Google nGrams as a data source. Findings – The paper presents a proposal for effectively repositioning research libraries in the context of eScience and eScholarship as well as clear indications of the proposed repositioning already taking place. Furthermore, a new perspective of the “document” notion is provided. Practical implications – The evolution described in the contribution creates opportunities for libraries to reposition themselves as aggregators and selectors of content and as contextualising agents as part of future Linked Data based scholarly research environments provided they are able and ready to operate the related cultural changes. Originality/value – The paper will be useful for practitioners in search of strategic guidance for repositioning their librarian institutions in a context of ever increasing competition for scarce funding resources. Collapse

Jimeno Yepes A, Verspoor K. Literature mining of genetic variants for curation: quantifying the importance of supplementary material. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014;2014:bau003. [PMID: 24520105 PMCID: PMC3920087 DOI: 10.1093/database/bau003] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Abstract

A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously considered by text mining tools. Although it is anecdotally known that supplementary material contains 'all of the information', and some researchers have speculated about the role of supplementary material (Schenck et al. Extraction of genetic mutations associated with cancer from public literature. J Health Med Inform 2012;S2:2.), our analysis substantiates the significant extent to which this material is critical. Our results highlight the need for literature mining tools to consider not only the narrative content of a publication but also the full set of material related to a publication.

Collapse

Patterson DJ, Egloff W, Agosti D, Eades D, Franz N, Hagedorn G, Rees JA, Remsen DP. Scientific names of organisms: attribution, rights, and licensing. BMC Res Notes 2014;7:79. [PMID: 24495358 PMCID: PMC3922623 DOI: 10.1186/1756-0500-7-79] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Accepted: 01/28/2014] [Indexed: 11/10/2022] Open

Shabo A. Towards a translational health information language. EPMA J 2014. [PMCID: PMC4125964 DOI: 10.1186/1878-5085-5-s1-a51] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Elliott JH, Turner T, Clavisi O, Thomas J, Higgins JPT, Mavergames C, Gruen RL. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med 2014;11:e1001603. [PMID: 24558353 PMCID: PMC3928029 DOI: 10.1371/journal.pmed.1001603] [Citation(s) in RCA: 307] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Qin H, Davis L, Mayernik M, Lankao PR, D'Ignazio J, Alston P. Variables As Currency: Linking Meta-Analysis Research and Data Paths in Sciences. DATA SCIENCE JOURNAL 2014. [DOI: 10.2481/dsj.14-030] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Livingston KM, Bada M, Hunter LE, Verspoor K. Representing annotation compositionality and provenance for the Semantic Web. J Biomed Semantics 2013;4:38. [PMID: 24268021 PMCID: PMC4129183 DOI: 10.1186/2041-1480-4-38] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 09/20/2013] [Indexed: 12/03/2022] Open

Rebholz-Schuhmann D, Grabmüller C, Kavaliauskas S, Croset S, Woollard P, Backofen R, Filsell W, Clark D. A case study: semantic integration of gene-disease associations for type 2 diabetes mellitus from literature and biomedical data resources. Drug Discov Today 2013;19:882-9. [PMID: 24201223 DOI: 10.1016/j.drudis.2013.10.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2012] [Revised: 09/24/2013] [Accepted: 10/28/2013] [Indexed: 10/26/2022]