1
|
van Kampen AHC, Mahamune U, Jongejan A, van Schaik BDC, Balashova D, Lashgari D, Pras-Raves M, Wever EJM, Dane AD, García-Valiente R, Moerland PD. ENCORE: a practical implementation to improve reproducibility and transparency of computational research. Nat Commun 2024; 15:8117. [PMID: 39284801 PMCID: PMC11405857 DOI: 10.1038/s41467-024-52446-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 09/06/2024] [Indexed: 09/20/2024] Open
Abstract
Reproducibility of computational research is often challenging despite established guidelines and best practices. Translating these guidelines into practical applications remains difficult. Here, we present ENCORE, an approach to enhance transparency and reproducibility by guiding researchers in how to structure and document a computational project. ENCORE builds on previous efforts in computational reproducibility and integrates all project components into a standardized file system structure. It utilizes pre-defined files as documentation templates, leverages GitHub for software versioning, and includes an HTML-based navigator. ENCORE is designed to be agnostic to the type of computational project, data, programming language, and ICT infrastructure, and does not rely on specific software tools. We also share our group's experience using ENCORE, highlighting that the most significant challenge to the routine adoption of approaches like ours is the lack of incentives to motivate researchers to dedicate sufficient time and effort to ensure reproducibility.
Collapse
Affiliation(s)
- Antoine H C van Kampen
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands.
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands.
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands.
- Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, Amsterdam, Netherlands.
| | - Utkarsh Mahamune
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| | - Aldo Jongejan
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
| | - Barbera D C van Schaik
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| | - Daria Balashova
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| | - Danial Lashgari
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| | - Mia Pras-Raves
- Amsterdam UMC, University of Amsterdam, Department of Clinical Chemistry, Laboratory Genetic Metabolic Diseases, Meibergdreef 9, Amsterdam, Netherlands
- Core Facility Metabolomics, Amsterdam UMC, Amsterdam, Netherlands
| | - Eric J M Wever
- Amsterdam UMC, University of Amsterdam, Department of Clinical Chemistry, Laboratory Genetic Metabolic Diseases, Meibergdreef 9, Amsterdam, Netherlands
- Core Facility Metabolomics, Amsterdam UMC, Amsterdam, Netherlands
| | - Adrie D Dane
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Core Facility Metabolomics, Amsterdam UMC, Amsterdam, Netherlands
| | - Rodrigo García-Valiente
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| | - Perry D Moerland
- Amsterdam UMC, University of Amsterdam, Bioinformatics Laboratory, Epidemiology and Data Science, Meibergdreef 9, Amsterdam, Netherlands
- Netherlands. Amsterdam Public Health, Methodology, Amsterdam, Netherlands
- Amsterdam Institute for Immunology and Infectious Diseases, Amsterdam, Netherlands
| |
Collapse
|
2
|
Lin RZ, Amith MT, Wang CX, Strickley J, Tao C. Dermoscopy Differential Diagnosis Explorer (D3X) Ontology to Aggregate and Link Dermoscopic Patterns to Differential Diagnoses: Development and Usability Study. JMIR Med Inform 2024; 12:e49613. [PMID: 38904996 PMCID: PMC11226929 DOI: 10.2196/49613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 04/18/2024] [Accepted: 05/04/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Dermoscopy is a growing field that uses microscopy to allow dermatologists and primary care physicians to identify skin lesions. For a given skin lesion, a wide variety of differential diagnoses exist, which may be challenging for inexperienced users to name and understand. OBJECTIVE In this study, we describe the creation of the dermoscopy differential diagnosis explorer (D3X), an ontology linking dermoscopic patterns to differential diagnoses. METHODS Existing ontologies that were incorporated into D3X include the elements of visuals ontology and dermoscopy elements of visuals ontology, which connect visual features to dermoscopic patterns. A list of differential diagnoses for each pattern was generated from the literature and in consultation with domain experts. Open-source images were incorporated from DermNet, Dermoscopedia, and open-access research papers. RESULTS D3X was encoded in the OWL 2 web ontology language and includes 3041 logical axioms, 1519 classes, 103 object properties, and 20 data properties. We compared D3X with publicly available ontologies in the dermatology domain using a semiotic theory-driven metric to measure the innate qualities of D3X with others. The results indicate that D3X is adequately comparable with other ontologies of the dermatology domain. CONCLUSIONS The D3X ontology is a resource that can link and integrate dermoscopic differential diagnoses and supplementary information with existing ontology-based resources. Future directions include developing a web application based on D3X for dermoscopy education and clinical practice.
Collapse
Affiliation(s)
- Rebecca Z Lin
- Division of Dermatology, Washington University School of Medicine, St. Louis, MO, United States
| | - Muhammad Tuan Amith
- Department of Information Science, University of North Texas, Denton, TX, United States
- Department of Biostatistics and Data Science, The University of Texas Medical Branch, Galveston, TX, United States
- Department of Internal Medicine, The University of Texas Medical Branch, Galveston, TX, United States
| | - Cynthia X Wang
- Department of Dermatology, Kaiser Permanente Redwood City Medical Center, Redwood City, CA, United States
| | - John Strickley
- Division of Dermatology, University of Louisville, Louisville, KY, United States
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States
| |
Collapse
|
3
|
Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. FRONTIERS IN PLANT SCIENCE 2023; 14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]
Abstract
The importance of improving the FAIRness (findability, accessibility, interoperability, reusability) of research data is undeniable, especially in the face of large, complex datasets currently being produced by omics technologies. Facilitating the integration of a dataset with other types of data increases the likelihood of reuse, and the potential of answering novel research questions. Ontologies are a useful tool for semantically tagging datasets as adding relevant metadata increases the understanding of how data was produced and increases its interoperability. Ontologies provide concepts for a particular domain as well as the relationships between concepts. By tagging data with ontology terms, data becomes both human- and machine- interpretable, allowing for increased reuse and interoperability. However, the task of identifying ontologies relevant to a particular research domain or technology is challenging, especially within the diverse realm of fundamental plant research. In this review, we outline the ontologies most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments within metadata frameworks, such as Investigation-Study-Assay (ISA). We also outline repositories and platforms most useful for identifying applicable ontologies or finding ontology terms.
Collapse
Affiliation(s)
- Kathryn Dumschott
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Hannah Dörpholz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Dominik Brilhaus
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrea Schrader
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science & Cluster of Excellence on Plant Sciences (CEPLAS), Faculty of Mathematics and Life Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Steffen Neumann
- Program Center MetaCom, Leibniz Institute of Plant Biochemistry, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Angela Kranz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
4
|
Wu W, Luo S, Wang H. Design of an automatic landscape design system in smart cities based on vision computing. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:16383-16400. [PMID: 37920017 DOI: 10.3934/mbe.2023731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
In future smart cities, automatic landscape design can be viewed as a promising intelligent application to reduce the reliance on expert labors. As it is a kind of visual sensing activity, it is expected to develop a robust interaction platform with strong ability of visual information fusion. To deal with this issue, this paper integrates vision computing, and designs an automatic landscape design system in smart cities. The whole design framework can be attributed as three aspects of works: function analysis, structure design and implementation. Among, the visual information processing runs through the three aspects. Then, the generation process of landscape design is simulated in detail via a systematic case study. To prove the significance of visual information processing in our proposal, this article uses a model analysis method to compare the effects of traditional data processing technology and visual data processing technology. The analysis results show that vision computing technology provides technical support for landscape design. We also carry out some performance testing towards the designed automatic landscape design system, and evaluation results are demonstrated via visualization format. The designed automatic system is a proper prototype that can be developed to realistic engineering systems by some following completion.
Collapse
Affiliation(s)
- Wei Wu
- School of Civil Engineering, Architecture and Environment, Hubei University of Technology, Wuhan, Hubei 430068, China
| | - Shicheng Luo
- School of Civil Engineering, Architecture and Environment, Hubei University of Technology, Wuhan, Hubei 430068, China
| | - Hongying Wang
- School of Civil Engineering, Architecture and Environment, Hubei University of Technology, Wuhan, Hubei 430068, China
| |
Collapse
|
5
|
Amith MT, Cui L, Roberts K, Tao C. Application of an ontology for model cards to generate computable artifacts for linking machine learning information from biomedical research. PROCEEDINGS OF THE ... INTERNATIONAL WORLD-WIDE WEB CONFERENCE. INTERNATIONAL WWW CONFERENCE 2023; 2023:820-825. [PMID: 38327770 PMCID: PMC10848146 DOI: 10.1145/3543873.3587601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Model card reports provide a transparent description of machine learning models which includes information about their evaluation, limitations, intended use, etc. Federal health agencies have expressed an interest in model cards report for research studies using machine-learning based AI. Previously, we have developed an ontology model for model card reports to structure and formalize these reports. In this paper, we demonstrate a Java-based library (OWL API, FaCT++) that leverages our ontology to publish computable model card reports. We discuss future directions and other use cases that highlight applicability and feasibility of ontology-driven systems to support FAIR challenges.
Collapse
Affiliation(s)
| | - Licong Cui
- The University of Texas Health Science Center at Houston, USA
| | - Kirk Roberts
- The University of Texas Health Science Center at Houston, USA
| | - Cui Tao
- The University of Texas Health Science Center at Houston, USA
| |
Collapse
|
6
|
Du X, Dastmalchi F, Ye H, Garrett TJ, Diller MA, Liu M, Hogan WR, Brochhausen M, Lemas DJ. Evaluating LC-HRMS metabolomics data processing software using FAIR principles for research software. Metabolomics 2023; 19:11. [PMID: 36745241 DOI: 10.1007/s11306-023-01974-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 01/20/2023] [Indexed: 02/07/2023]
Abstract
BACKGROUND Liquid chromatography-high resolution mass spectrometry (LC-HRMS) is a popular approach for metabolomics data acquisition and requires many data processing software tools. The FAIR Principles - Findability, Accessibility, Interoperability, and Reusability - were proposed to promote open science and reusable data management, and to maximize the benefit obtained from contemporary and formal scholarly digital publishing. More recently, the FAIR principles were extended to include Research Software (FAIR4RS). AIM OF REVIEW This study facilitates open science in metabolomics by providing an implementation solution for adopting FAIR4RS in the LC-HRMS metabolomics data processing software. We believe our evaluation guidelines and results can help improve the FAIRness of research software. KEY SCIENTIFIC CONCEPTS OF REVIEW We evaluated 124 LC-HRMS metabolomics data processing software obtained from a systematic review and selected 61 software for detailed evaluation using FAIR4RS-related criteria, which were extracted from the literature along with internal discussions. We assigned each criterion one or more FAIR4RS categories through discussion. The minimum, median, and maximum percentages of criteria fulfillment of software were 21.6%, 47.7%, and 71.8%. Statistical analysis revealed no significant improvement in FAIRness over time. We identified four criteria covering multiple FAIR4RS categories but had a low %fulfillment: (1) No software had semantic annotation of key information; (2) only 6.3% of evaluated software were registered to Zenodo and received DOIs; (3) only 14.5% of selected software had official software containerization or virtual machine; (4) only 16.7% of evaluated software had a fully documented functions in code. According to the results, we discussed improvement strategies and future directions.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Farhad Dastmalchi
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Hao Ye
- Health Science Center Libraries, University of Florida, Florida, USA
| | - Timothy J Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Florida, USA
| | - Matthew A Diller
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mei Liu
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Dominick J Lemas
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, USA.
- Department of Obstetrics and Gynecology, University of Florida College of Medicine, Florida, Gainesville, United States.
- Center for Perinatal Outcomes Research, University of Florida College of Medicine, Gainesville, United States.
| |
Collapse
|
7
|
Keshavarzi M, Ghaffary HR. An ontology-driven framework for knowledge representation of digital extortion attacks. COMPUTERS IN HUMAN BEHAVIOR 2023; 139:107520. [PMID: 36268220 PMCID: PMC9557090 DOI: 10.1016/j.chb.2022.107520] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 10/02/2022] [Accepted: 10/07/2022] [Indexed: 11/22/2022]
Abstract
With the COVID-19 pandemic and the growing influence of the Internet in critical sectors of industry and society, cyberattacks have not only not declined, but have risen sharply. In the meantime, ransomware is at the forefront of the most devastating threats that have launched the lucrative illegal business. Due to the proliferation and variety of ransomware forays, there is a need for a new theory of categories. The intricacy and multiplicity of components involved in digital extortions entails the construction of a knowledge representation system that is able to organize large volumes of information from heterogeneous sources in a formal structured format and infer new knowledge from it. This paper suggests and develops a dedicated ontology of digital blackmails, called Rantology, with a particular focus on ransomware assaults. The logic coded in this ontology allows to assess the maliciousness of programs based on various factors, including called API functions and their behaviors. The proposed framework can be used to facilitate interoperability between cybersecurity experts and knowledge-based systems, and identify sensitive points for surveillance. The evaluation results based on several criteria confirm the adequacy of the suggested ontology in terms of clarity, modularity, consistency, coverage and inheritance richness.
Collapse
|
8
|
Ontology-Based Semantic Checking of Data in Railway Infrastructure Information Systems. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES 2022. [DOI: 10.2478/fcds-2022-0016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Abstract
Semantic checking of railway infrastructure information support data is one of the ways to improve the consistency of information system data and, as a result, increase the safety of train traffic. Existing ontological developments have demonstrated the applicability of description logic for modelling railway transport, but have not paid enough attention to the data resources structure and the railway regulatory support. In this work, the formalization of the tabular presentation of data and the rules of railway transport regulations is carried out using the example of a connection track passport and temporary speed restrictions using ontological means, data wrangling and extraction tools. Ontologies of the various formats data resources and railway station infrastructure, tools for converting and extracting data have been developed. The semantic checking of the compliance of railway information system data with regulatory documents in terms of the connection track passport is carried out on the basis of a multi-level concretization model and integration of ontologies. The mechanisms for implementing the constituent ontologies and their integration are demonstrated by an example. Further research includes ontological checking of natural language normative documents of railway transport.
Collapse
|
9
|
Amith MT, Cui L, Zhi D, Roberts K, Jiang X, Li F, Yu E, Tao C. Toward a standard formal semantic representation of the model card report. BMC Bioinformatics 2022; 23:281. [PMID: 35836130 PMCID: PMC9284683 DOI: 10.1186/s12859-022-04797-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 06/15/2022] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Model card reports aim to provide informative and transparent description of machine learning models to stakeholders. This report document is of interest to the National Institutes of Health's Bridge2AI initiative to address the FAIR challenges with artificial intelligence-based machine learning models for biomedical research. We present our early undertaking in developing an ontology for capturing the conceptual-level information embedded in model card reports. RESULTS Sourcing from existing ontologies and developing the core framework, we generated the Model Card Report Ontology. Our development efforts yielded an OWL2-based artifact that represents and formalizes model card report information. The current release of this ontology utilizes standard concepts and properties from OBO Foundry ontologies. Also, the software reasoner indicated no logical inconsistencies with the ontology. With sample model cards of machine learning models for bioinformatics research (HIV social networks and adverse outcome prediction for stent implantation), we showed the coverage and usefulness of our model in transforming static model card reports to a computable format for machine-based processing. CONCLUSIONS The benefit of our work is that it utilizes expansive and standard terminologies and scientific rigor promoted by biomedical ontologists, as well as, generating an avenue to make model cards machine-readable using semantic web technology. Our future goal is to assess the veracity of our model and later expand the model to include additional concepts to address terminological gaps. We discuss tools and software that will utilize our ontology for potential application services.
Collapse
Affiliation(s)
- Muhammad Tuan Amith
- grid.267308.80000 0000 9206 2401School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX USA
| | - Licong Cui
- grid.267308.80000 0000 9206 2401School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX USA
| | - Degui Zhi
- grid.267308.80000 0000 9206 2401School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX USA
| | - Kirk Roberts
- grid.267308.80000 0000 9206 2401School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX USA
| | - Xiaoqian Jiang
- grid.267308.80000 0000 9206 2401School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX USA
| | - Fang Li
- grid.267308.80000 0000 9206 2401School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX USA
| | - Evan Yu
- grid.267308.80000 0000 9206 2401School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX USA
| | - Cui Tao
- grid.267308.80000 0000 9206 2401School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX USA
| |
Collapse
|
10
|
ReqTagger: A Rule-Based Tagger for Automatic Glossary of Terms Extraction from Ontology Requirements. FOUNDATIONS OF COMPUTING AND DECISION SCIENCES 2022. [DOI: 10.2478/fcds-2022-0003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Abstract
Glossary of Terms extraction from textual requirements is an important step in ontology engineering methodologies. Although initially it was intended to be performed manually, last years have shown that some degree of automatization is possible. Based on these promising approaches, we introduce a novel, human interpretable, rule-based method named ReqTagger, which can extract candidates for ontology entities (classes or instances) and relations (data or object properties) from textual requirements automatically. We compare ReqTagger to existing automatic methods on an evaluation benchmark consisting of over 550 requirements and tagged with over 1700 entities and relations expected to be extracted. We discuss the quality of ReqTagger and provide details showing why it outperforms other methods. We also publish both the evaluation dataset and the implementation of ReqTagger.
Collapse
|
11
|
Du X, Aristizabal-Henao JJ, Garrett TJ, Brochhausen M, Hogan WR, Lemas DJ. A Checklist for Reproducible Computational Analysis in Clinical Metabolomics Research. Metabolites 2022; 12:87. [PMID: 35050209 PMCID: PMC8779534 DOI: 10.3390/metabo12010087] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 12/25/2021] [Accepted: 01/10/2022] [Indexed: 12/15/2022] Open
Abstract
Clinical metabolomics emerged as a novel approach for biomarker discovery with the translational potential to guide next-generation therapeutics and precision health interventions. However, reproducibility in clinical research employing metabolomics data is challenging. Checklists are a helpful tool for promoting reproducible research. Existing checklists that promote reproducible metabolomics research primarily focused on metadata and may not be sufficient to ensure reproducible metabolomics data processing. This paper provides a checklist including actions that need to be taken by researchers to make computational steps reproducible for clinical metabolomics studies. We developed an eight-item checklist that includes criteria related to reusable data sharing and reproducible computational workflow development. We also provided recommended tools and resources to complete each item, as well as a GitHub project template to guide the process. The checklist is concise and easy to follow. Studies that follow this checklist and use recommended resources may facilitate other researchers to reproduce metabolomics results easily and efficiently.
Collapse
Affiliation(s)
- Xinsong Du
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA; (X.D.); (W.R.H.)
| | | | - Timothy J. Garrett
- Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA;
| | - Mathias Brochhausen
- Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA;
| | - William R. Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA; (X.D.); (W.R.H.)
| | - Dominick J. Lemas
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL 32610, USA; (X.D.); (W.R.H.)
| |
Collapse
|
12
|
Schindler D, Bensmann F, Dietze S, Krüger F. The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central. PeerJ Comput Sci 2022; 8:e835. [PMID: 35111920 PMCID: PMC8771769 DOI: 10.7717/peerj-cs.835] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 12/07/2021] [Indexed: 06/06/2023]
Abstract
Science across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysing data. Thus, transparency about software used as part of the scientific process is crucial to understand provenance of individual research data and insights, is a prerequisite for reproducibility and can enable macro-analysis of the evolution of scientific methods over time. However, missing rigor in software citation practices renders the automated detection and disambiguation of software mentions a challenging problem. In this work, we provide a large-scale analysis of software usage and citation practices facilitated through an unprecedented knowledge graph of software mentions and affiliated metadata generated through supervised information extraction models trained on a unique gold standard corpus and applied to more than 3 million scientific articles. Our information extraction approach distinguishes different types of software and mentions, disambiguates mentions and outperforms the state-of-the-art significantly, leading to the most comprehensive corpus of 11.8 M software mentions that are described through a knowledge graph consisting of more than 300 M triples. Our analysis provides insights into the evolution of software usage and citation patterns across various fields, ranks of journals, and impact of publications. Whereas, to the best of our knowledge, this is the most comprehensive analysis of software use and citation at the time, all data and models are shared publicly to facilitate further research into scientific use and citation of software.
Collapse
Affiliation(s)
- David Schindler
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
| | - Felix Bensmann
- GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
| | - Stefan Dietze
- GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
- Heinrich-Heine-University, Düsseldorf, Germany
| | - Frank Krüger
- Institute of Communications Engineering, University of Rostock, Rostock, Germany
- Department Knowledge, Culture & Transformation, University of Rostock, Rostock, Germany
| |
Collapse
|
13
|
Leipzig J, Nüst D, Hoyt CT, Ram K, Greenberg J. The role of metadata in reproducible computational research. PATTERNS (NEW YORK, N.Y.) 2021; 2:100322. [PMID: 34553169 PMCID: PMC8441584 DOI: 10.1016/j.patter.2021.100322] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Reproducible computational research (RCR) is the keystone of the scientific method for in silico analyses, packaging the transformation of raw data to published results. In addition to its role in research integrity, improving the reproducibility of scientific studies can accelerate evaluation and reuse. This potential and wide support for the FAIR principles have motivated interest in metadata standards supporting reproducibility. Metadata provide context and provenance to raw data and methods and are essential to both discovery and validation. Despite this shared connection with scientific data, few studies have explicitly described how metadata enable reproducible computational research. This review employs a functional content analysis to identify metadata standards that support reproducibility across an analytic stack consisting of input data, tools, notebooks, pipelines, and publications. Our review provides background context, explores gaps, and discovers component trends of embeddedness and methodology weight from which we derive recommendations for future work.
Collapse
Affiliation(s)
- Jeremy Leipzig
- Metadata Research Center, College of Computing and Informatics, Drexel University, Philadelphia, PA, USA
| | - Daniel Nüst
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | | | - Karthik Ram
- Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, USA
| | - Jane Greenberg
- Metadata Research Center, College of Computing and Informatics, Drexel University, Philadelphia, PA, USA
| |
Collapse
|
14
|
Mayer G, Müller W, Schork K, Uszkoreit J, Weidemann A, Wittig U, Rey M, Quast C, Felden J, Glöckner FO, Lange M, Arend D, Beier S, Junker A, Scholz U, Schüler D, Kestler HA, Wibberg D, Pühler A, Twardziok S, Eils J, Eils R, Hoffmann S, Eisenacher M, Turewicz M. Implementing FAIR data management within the German Network for Bioinformatics Infrastructure (de.NBI) exemplified by selected use cases. Brief Bioinform 2021; 22:bbab010. [PMID: 33589928 PMCID: PMC8425304 DOI: 10.1093/bib/bbab010] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 12/21/2020] [Accepted: 01/06/2021] [Indexed: 12/21/2022] Open
Abstract
This article describes some use case studies and self-assessments of FAIR status of de.NBI services to illustrate the challenges and requirements for the definition of the needs of adhering to the FAIR (findable, accessible, interoperable and reusable) data principles in a large distributed bioinformatics infrastructure. We address the challenge of heterogeneity of wet lab technologies, data, metadata, software, computational workflows and the levels of implementation and monitoring of FAIR principles within the different bioinformatics sub-disciplines joint in de.NBI. On the one hand, this broad service landscape and the excellent network of experts are a strong basis for the development of useful research data management plans. On the other hand, the large number of tools and techniques maintained by distributed teams renders FAIR compliance challenging.
Collapse
Affiliation(s)
- Gerhard Mayer
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
- Ulm University, Institute of Medical Systems Biology, Ulm, Germany
| | - Wolfgang Müller
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Karin Schork
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Julian Uszkoreit
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Andreas Weidemann
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Ulrike Wittig
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | - Maja Rey
- Heidelberg Institute for Theoretical Studies (HITS gGmbH), Scientific Databases and Visualization Group, Heidelberg, Germany
| | | | - Janine Felden
- Jacobs University Bremen gGmbH, Bremen, Germany
- University of Bremen, MARUM - Center for Marine Environmental Sciences, Bremen, Germany
| | - Frank Oliver Glöckner
- Jacobs University Bremen gGmbH, Bremen, Germany
- University of Bremen, MARUM - Center for Marine Environmental Sciences, Bremen, Germany
- Alfred Wegener Institute - Helmholtz Center for Polar- and Marine Research, Bremerhaven, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Sebastian Beier
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Danuta Schüler
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany
| | - Hans A Kestler
- Ulm University, Institute of Medical Systems Biology, Ulm, Germany
- Leibniz Institute on Ageing - Fritz Lipmann Institute, Jena
| | - Daniel Wibberg
- Bielefeld University, Center for Biotechnology (CeBiTec), Bielefeld, Germany
| | - Alfred Pühler
- Bielefeld University, Center for Biotechnology (CeBiTec), Bielefeld, Germany
| | - Sven Twardziok
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
| | - Jürgen Eils
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
| | - Roland Eils
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health (BIH), Center for Digital Health, Berlin, Germany
- Heidelberg University Hospital and BioQuant, Health Data Science Unit, Heidelberg, Germany
| | - Steve Hoffmann
- Leibniz Institute on Ageing - Fritz Lipmann Institute, Jena
| | - Martin Eisenacher
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| | - Michael Turewicz
- Ruhr University Bochum, Faculty of Medicine, Medizinisches Proteom-Center, Bochum, Germany
- Ruhr University Bochum, Center for Protein Diagnostics (ProDi), Medical Proteome Analysis, Bochum, Germany
| |
Collapse
|
15
|
Ison J, Ienasescu H, Rydza E, Chmura P, Rapacki K, Gaignard A, Schwämmle V, van Helden J, Kalaš M, Ménager H. biotoolsSchema: a formalized schema for bioinformatics software description. Gigascience 2021; 10:giaa157. [PMID: 33506265 PMCID: PMC7842104 DOI: 10.1093/gigascience/giaa157] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Revised: 11/10/2020] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description-and cataloguing-of bioinformatics resources. FINDINGS Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. CONCLUSIONS biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.
Collapse
Affiliation(s)
- Jon Ison
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
| | - Hans Ienasescu
- National Life Science Supercomputing Center, Technical University of Denmark, Building 208, DK-2800 Kongens Lyngby, Denmark
| | - Emil Rydza
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Piotr Chmura
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3B, 2200 København, Denmark
| | - Kristoffer Rapacki
- Department of Health Technology, Ørsteds Plads, Building 345C, DK-2800 Kongens, Lyngby, Denmark
| | - Alban Gaignard
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- L'institut du Thorax, INSERM, CNRS, University of Nantes, 44007 Nantes, France
| | - Veit Schwämmle
- Department of Biochemistry and Molecular Biology and VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Campusvej 55, 5230 Odense, Denmark
| | - Jacques van Helden
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Département de Biologie, Aix-Marseille Université (AMU), 3 place Victor Hugo, 13003 Marseille, France
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, N-5008 Bergen, Norway
| | - Hervé Ménager
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, 2 rue Gaston Crémieux, F-91000 Evry, France
- Hub de Bioinformatique et Biostatistique–Département Biologie Computationnelle, Institut Pasteur, USR 3756, CNRS, Paris 75015, France
| |
Collapse
|
16
|
Volk M, Staegemann D, Jamous N, Pohl M, Turowski K. Providing Clarity on Big Data Technologies. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES 2020. [DOI: 10.4018/ijiit.2020040103] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Big Data is a term that gained popularity due to its potential benefits in various fields, and is progressively being used. However, there are still many gaps and challenges to overcome, especially when it comes to the selection and handling of relevant technologies. A consequence of the huge number of manifestations in this area, growing each year, the uncertainty and complexity increase. The lack of a classification approach causes a growing demand for more experts with a broad knowledge and expertise. Using various techniques of ontology engineering and following the design science methodology, this work proposes the Big Data Technology Ontology (BDTOnto) as a comprehensive and sustainable classification approach to classify big data technologies and their manifestations. In particular, a reusable, extensible and adaptable artifact in the form of an ontology will be developed and evaluated.
Collapse
|
17
|
|
18
|
Potoniec J, Wiśniewski D, Ławrynowicz A, Keet CM. Dataset of ontology competency questions to SPARQL-OWL queries translations. Data Brief 2020; 29:105098. [PMID: 31989008 PMCID: PMC6971340 DOI: 10.1016/j.dib.2019.105098] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 12/20/2019] [Accepted: 12/26/2019] [Indexed: 11/28/2022] Open
Abstract
This data article reports on a new set of 234 competency questions for ontology development and their formalisation into a set of 131 SPARQL-OWL queries. This is the largest set of competency questions with their linked queries to date, covering several ontologies of different type in different subject domains developed by different groups of question authors and ontology developers. The dataset is focused specifically on the ontology TBox (terminological part). The dataset may serve as a manually created gold standard for testing and benchmarking, research into competency questions and querying ontologies, and tool development. The data is available in Mendeley Data. Its analysis is presented in “Analysis of Ontology Competency Questions and their formalizations in SPARQL-OWL” [15].
Collapse
Affiliation(s)
- Jedrzej Potoniec
- Faculty of Computing, Poznan University of Technology, Ul. Piotrowo 3, 60-965 Poznan, Poland.,Center for Artificial Intelligence and Machine Learning, Poznan University of Technology, Ul. Piotrowo 2, 60-965 Poznan, Poland
| | - Dawid Wiśniewski
- Faculty of Computing, Poznan University of Technology, Ul. Piotrowo 3, 60-965 Poznan, Poland
| | - Agnieszka Ławrynowicz
- Faculty of Computing, Poznan University of Technology, Ul. Piotrowo 3, 60-965 Poznan, Poland.,Center for Artificial Intelligence and Machine Learning, Poznan University of Technology, Ul. Piotrowo 2, 60-965 Poznan, Poland
| | - C Maria Keet
- Department of Computer Science, University of Cape Town, Private Bag X3 Rondebosch 7701 South Africa
| |
Collapse
|
19
|
Appice A, Tsoumakas G, Manolopoulos Y, Matwin S. Semantic Annotation of Predictive Modelling Experiments. DISCOVERY SCIENCE 2020. [PMCID: PMC7556382 DOI: 10.1007/978-3-030-61527-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Abstract
In this paper, we address the task of representation, semantic annotation, storage, and querying of predictive modelling experiments. We introduce OntoExp, an OntoDM module which gives a more granular representation of a predictive modeling experiment and enables annotation of the experiment’s provenance, algorithm implementations, parameter settings and output metrics. This module is incorporated in SemanticHub, an online system that allows execution, annotation, storage and querying of predictive modeling experiments. The system offers two different user scenarios. The users can either define their own experiment and execute it, or they can browse the repository of completed experimental workflows across different predictive modelling tasks. Here, we showcase the capabilities of the system with executing multi-target regression experiment on a water quality prediction dataset using the Clus software. The system and created repositories are evaluated based on the FAIR data stewardship guidelines. The evaluation shows that OntoExp and SemanticHub provide the infrastructure needed for semantic annotation, execution, storage, and querying of the experiments.
Collapse
|
20
|
Snider S, Scott II WL, Trewin S. Accessibility Information Needs in the Enterprise. ACM TRANSACTIONS ON ACCESSIBLE COMPUTING 2019. [DOI: 10.1145/3368620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
We describe the questions asked about accessibility, both through information searches and direct queries, within a large multinational corporation over a period of two years, finding an emphasis on topics covering enterprise requirements for testing, recording, and reporting compliance. Our analysis finds that up to 66% of these questions may be answerable by an accessibility ontology, but only 26% of the terms in the questions are concepts found in existing available accessibility ontologies. To fill this gap, we introduce the Enterprise Accessibility Conformance Ontology, which extends previous ontologies to include the relevant concepts. We demonstrate the use of the ontology to provide a unifying model of the accessibility domain that contributed to a 22% performance improvement for a question-answering accessibility conformance chatbot.
Collapse
|
21
|
Flynn AJ, Friedman CP, Boisvert P, Landis‐Lewis Z, Lagoze C. The Knowledge Object Reference Ontology (KORO): A formalism to support management and sharing of computable biomedical knowledge for learning health systems. Learn Health Syst 2018; 2:e10054. [PMID: 31245583 PMCID: PMC6508779 DOI: 10.1002/lrh2.10054] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Revised: 02/12/2018] [Accepted: 02/16/2018] [Indexed: 11/07/2022] Open
Abstract
INTRODUCTION Health systems are challenged by care underutilization, overutilization, disparities, and related harms. One problem is a multiyear latency between discovery of new best practice knowledge and its widespread adoption. Decreasing this latency requires new capabilities to better manage and more rapidly share biomedical knowledge in computable forms. Knowledge objects package machine-executable knowledge resources in a way that easily enables knowledge as a service. To help improve knowledge management and accelerate knowledge sharing, the Knowledge Object Reference Ontology (KORO) defines what knowledge objects are in a formal way. METHODS Development of KORO began with identification of terms for classes of entities and for properties. Next, we established a taxonomical hierarchy of classes for knowledge objects and their parts. Development continued by relating these parts via formally defined properties. We evaluated the logical consistency of KORO and used it to answer several competency questions about parthood. We also applied it to guide knowledge object implementation. RESULTS As a realist ontology, KORO defines what knowledge objects are and provides details about the parts they have and the roles they play. KORO provides sufficient logic to answer several basic but important questions about knowledge objects competently. KORO directly supports creators of knowledge objects by providing a formal model for these objects. CONCLUSION KORO provides a formal, logically consistent ontology about knowledge objects and their parts. It exists to help make computable biomedical knowledge findable, accessible, interoperable, and reusable. KORO is currently being used to further develop and improve computable knowledge infrastructure for learning health systems.
Collapse
Affiliation(s)
- Allen J. Flynn
- School of InformationUniversity of MichiganAnn ArborMichigan
- School of MedicineUniversity of MichiganAnn ArborMichigan
| | - Charles P. Friedman
- School of InformationUniversity of MichiganAnn ArborMichigan
- School of MedicineUniversity of MichiganAnn ArborMichigan
- School of Public HealthUniversity of MichiganAnn ArborMichigan
| | - Peter Boisvert
- School of MedicineUniversity of MichiganAnn ArborMichigan
| | | | - Carl Lagoze
- School of InformationUniversity of MichiganAnn ArborMichigan
| |
Collapse
|
22
|
Supporting metabolomics with adaptable software: design architectures for the end-user. Curr Opin Biotechnol 2017; 43:110-117. [DOI: 10.1016/j.copbio.2016.11.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Revised: 10/31/2016] [Accepted: 11/01/2016] [Indexed: 02/07/2023]
|
23
|
Gnimpieba EZ, VanDiermen MS, Gustafson SM, Conn B, Lushbough CM. Bio-TDS: bioscience query tool discovery system. Nucleic Acids Res 2017; 45:D1117-D1122. [PMID: 27924016 PMCID: PMC5210639 DOI: 10.1093/nar/gkw940] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 09/24/2016] [Accepted: 10/17/2016] [Indexed: 11/12/2022] Open
Abstract
Bioinformatics and computational biology play a critical role in bioscience and biomedical research. As researchers design their experimental projects, one major challenge is to find the most relevant bioinformatics toolkits that will lead to new knowledge discovery from their data. The Bio-TDS (Bioscience Query Tool Discovery Systems, http://biotds.org/) has been developed to assist researchers in retrieving the most applicable analytic tools by allowing them to formulate their questions as free text. The Bio-TDS is a flexible retrieval system that affords users from multiple bioscience domains (e.g. genomic, proteomic, bio-imaging) the ability to query over 15 000 analytic tool descriptions integrated from well-established, community repositories. One of the primary components of the Bio-TDS is the ontology and natural language processing workflow for annotation, curation, query processing, and evaluation. The Bio-TDS’s scientific impact was evaluated using sample questions posed by researchers retrieved from Biostars, a site focusing on biological data analysis. The Bio-TDS was compared to five similar bioscience analytic tool retrieval systems with the Bio-TDS outperforming the others in terms of relevance and completeness. The Bio-TDS offers researchers the capacity to associate their bioscience question with the most relevant computational toolsets required for the data analysis in their knowledge discovery process.
Collapse
Affiliation(s)
- Etienne Z Gnimpieba
- Biomedical Engineering Department, University of South Dakota, 4800 North Career Ave, Sioux Falls, SD 57107, USA .,BioSNTR, Brookings, SD 57006, USA
| | - Menno S VanDiermen
- Biomedical Engineering Department, University of South Dakota, 4800 North Career Ave, Sioux Falls, SD 57107, USA
| | - Shayla M Gustafson
- Biomedical Engineering Department, University of South Dakota, 4800 North Career Ave, Sioux Falls, SD 57107, USA
| | - Bill Conn
- Biomedical Engineering Department, University of South Dakota, 4800 North Career Ave, Sioux Falls, SD 57107, USA
| | - Carol M Lushbough
- Biomedical Engineering Department, University of South Dakota, 4800 North Career Ave, Sioux Falls, SD 57107, USA.,BioSNTR, Brookings, SD 57006, USA
| |
Collapse
|
24
|
Zheng J, Harris MR, Masci AM, Lin Y, Hero A, Smith B, He Y. The Ontology of Biological and Clinical Statistics (OBCS) for standardized and reproducible statistical analysis. J Biomed Semantics 2016; 7:53. [PMID: 27627881 PMCID: PMC5024438 DOI: 10.1186/s13326-016-0100-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 09/06/2016] [Indexed: 11/13/2022] Open
Abstract
Background Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. Results The terms in OBCS including ‘data collection’, ‘data transformation in statistics’, ‘data visualization’, ‘statistical data analysis’, and ‘drawing a conclusion based on data’, cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 research communities. Currently, OBCS comprehends 878 terms, representing 20 BFO classes, 403 OBI classes, 229 OBCS specific classes, and 122 classes imported from ten other OBO ontologies. We discuss two examples illustrating how the ontology is being applied. In the first (biological) use case, we describe how OBCS was applied to represent the high throughput microarray data analysis of immunological transcriptional profiles in human subjects vaccinated with an influenza vaccine. In the second (clinical outcomes) use case, we applied OBCS to represent the processing of electronic health care data to determine the associations between hospital staffing levels and patient mortality. Our case studies were designed to show how OBCS can be used for the consistent representation of statistical analysis pipelines under two different research paradigms. Other ongoing projects using OBCS for statistical data processing are also discussed. The OBCS source code and documentation are available at: https://github.com/obcs/obcs. Conclusions The Ontology of Biological and Clinical Statistics (OBCS) is a community-based open source ontology in the domain of biological and clinical statistics. OBCS is a timely ontology that represents statistics-related terms and their relations in a rigorous fashion, facilitates standard data analysis and integration, and supports reproducible biological and clinical research.
Collapse
Affiliation(s)
- Jie Zheng
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, 19104, USA.
| | - Marcelline R Harris
- Division of Systems Leadership and Effectiveness Science, University of Michigan School of Nursing, Ann Arbor, MI, 48109, USA
| | - Anna Maria Masci
- Department of Biostatistics and Bioinformatics, Duke Medical Center, Duke University, Durham, NC, 27710, USA
| | - Yu Lin
- Department of Microbiology and Immunology, Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Alfred Hero
- Department of Electrical Engineering and Computer Science, Department of Biomedical Engineering, and Department of Statistics, Michigan Institute of Data Science, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Barry Smith
- Department of Philosophy and National Center for Ontological Research, University at Buffalo, Buffalo, NY, 14203, USA
| | - Yongqun He
- Department of Microbiology and Immunology, Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
25
|
Lewis J, Breeze CE, Charlesworth J, Maclaren OJ, Cooper J. Where next for the reproducibility agenda in computational biology? BMC SYSTEMS BIOLOGY 2016; 10:52. [PMID: 27422148 PMCID: PMC4946111 DOI: 10.1186/s12918-016-0288-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Accepted: 06/08/2016] [Indexed: 11/24/2022]
Abstract
Background The concept of reproducibility is a foundation of the scientific method. With the arrival of fast and powerful computers over the last few decades, there has been an explosion of results based on complex computational analyses and simulations. The reproducibility of these results has been addressed mainly in terms of exact replicability or numerical equivalence, ignoring the wider issue of the reproducibility of conclusions through equivalent, extended or alternative methods. Results We use case studies from our own research experience to illustrate how concepts of reproducibility might be applied in computational biology. Several fields have developed ‘minimum information’ checklists to support the full reporting of computational simulations, analyses and results, and standardised data formats and model description languages can facilitate the use of multiple systems to address the same research question. We note the importance of defining the key features of a result to be reproduced, and the expected agreement between original and subsequent results. Dynamic, updatable tools for publishing methods and results are becoming increasingly common, but sometimes come at the cost of clear communication. In general, the reproducibility of computational research is improving but would benefit from additional resources and incentives. Conclusions We conclude with a series of linked recommendations for improving reproducibility in computational biology through communication, policy, education and research practice. More reproducible research will lead to higher quality conclusions, deeper understanding and more valuable knowledge.
Collapse
Affiliation(s)
- Joanna Lewis
- Centre for Maths and Physics in the Life Sciences and Experimental Biology, University College London, Physics Building, Gower Place, London, WC1E 6BT, UK. .,NIHR Health Protection Research Unit in Modelling Methodology, Department of Infectious Disease Epidemiology, Imperial College London, St Mary's Campus, Norfolk Place, London, W2 1PG, UK.
| | - Charles E Breeze
- UCL Cancer Institute, University College London, 72 Huntley St, London, WC1E 6DD, UK
| | - Jane Charlesworth
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, UK
| | - Oliver J Maclaren
- Department of Mathematics, University of Auckland, Auckland, 1142, New Zealand.,Department of Engineering Science, University of Auckland, Auckland, 1142, New Zealand
| | - Jonathan Cooper
- Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, Oxford, OX1 3QD, UK
| |
Collapse
|
26
|
|
27
|
Ochs C, Perl Y, Geller J, Haendel M, Brush M, Arabandi S, Tu S. Summarizing and visualizing structural changes during the evolution of biomedical ontologies using a Diff Abstraction Network. J Biomed Inform 2015; 56:127-44. [PMID: 26048076 DOI: 10.1016/j.jbi.2015.05.018] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Revised: 04/01/2015] [Accepted: 05/27/2015] [Indexed: 10/23/2022]
Abstract
Biomedical ontologies are a critical component in biomedical research and practice. As an ontology evolves, its structure and content change in response to additions, deletions and updates. When editing a biomedical ontology, small local updates may affect large portions of the ontology, leading to unintended and potentially erroneous changes. Such unwanted side effects often go unnoticed since biomedical ontologies are large and complex knowledge structures. Abstraction networks, which provide compact summaries of an ontology's content and structure, have been used to uncover structural irregularities, inconsistencies and errors in ontologies. In this paper, we introduce Diff Abstraction Networks ("Diff AbNs"), compact networks that summarize and visualize global structural changes due to ontology editing operations that result in a new ontology release. A Diff AbN can be used to support curators in identifying unintended and unwanted ontology changes. The derivation of two Diff AbNs, the Diff Area Taxonomy and the Diff Partial-area Taxonomy, is explained and Diff Partial-area Taxonomies are derived and analyzed for the Ontology of Clinical Research, Sleep Domain Ontology, and eagle-i Research Resource Ontology. Diff Taxonomy usage for identifying unintended erroneous consequences of quality assurance and ontology merging are demonstrated.
Collapse
Affiliation(s)
- Christopher Ochs
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, USA.
| | - Yehoshua Perl
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - James Geller
- Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Melissa Haendel
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | - Matthew Brush
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR 97239, USA
| | | | - Samson Tu
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|