1
|
Wagner MM, Hogan WR, Levander JD, Diller M. Towards Machine-FAIR: Representing software and datasets to facilitate reuse and scientific discovery by machines. J Biomed Inform 2024; 154:104647. [PMID: 38692465 PMCID: PMC11250896 DOI: 10.1016/j.jbi.2024.104647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 04/16/2024] [Accepted: 04/28/2024] [Indexed: 05/03/2024]
Abstract
OBJECTIVE To use software, datasets, and data formats in the domain of Infectious Disease Epidemiology as a test collection to evaluate a novel M1 use case, which we introduce in this paper. M1 is a machine that upon receipt of a new digital object of research exhaustively finds all valid compositions of it with existing objects. METHOD We implemented a data-format-matching-only M1 using exhaustive search, which we refer to as M1DFM. We then ran M1DFM on the test collection and used error analysis to identify needed semantic constraints. RESULTS Precision of M1DFM search was 61.7%. Error analysis identified needed semantic constraints and needed changes in handling of data services. Most semantic constraints were simple, but one data format was sufficiently complex to be practically impossible to represent semantic constraints over, from which we conclude limitatively that software developers will have to meet the machines halfway by engineering software whose inputs are sufficiently simple that their semantic constraints can be represented, akin to the simple APIs of services. We summarize these insights as M1-FAIR guiding principles for composability and suggest a roadmap for progressively capable devices in the service of reuse and accelerated scientific discovery. CONCLUSION Algorithmic search of digital repositories for valid workflow compositions has potential to accelerate scientific discovery but requires a scalable solution to the problem of knowledge acquisition about semantic constraints on software inputs. Additionally, practical limitations on the logical complexity of semantic constraints must be respected, which has implications for the design of software.
Collapse
Affiliation(s)
- Michael M Wagner
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA.
| | - William R Hogan
- Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, USA
| | - John D Levander
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Matthew Diller
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA
| |
Collapse
|
2
|
Cano MA, Tsueng G, Zhou X, Xin J, Hughes LD, Mullen JL, Su AI, Wu C. Schema Playground: a tool for authoring, extending, and using metadata schemas to improve FAIRness of biomedical data. BMC Bioinformatics 2023; 24:159. [PMID: 37081398 PMCID: PMC10116472 DOI: 10.1186/s12859-023-05258-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 03/27/2023] [Indexed: 04/22/2023] Open
Abstract
BACKGROUND Biomedical researchers are strongly encouraged to make their research outputs more Findable, Accessible, Interoperable, and Reusable (FAIR). While many biomedical research outputs are more readily accessible through open data efforts, finding relevant outputs remains a significant challenge. Schema.org is a metadata vocabulary standardization project that enables web content creators to make their content more FAIR. Leveraging Schema.org could benefit biomedical research resource providers, but it can be challenging to apply Schema.org standards to biomedical research outputs. We created an online browser-based tool that empowers researchers and repository developers to utilize Schema.org or other biomedical schema projects. RESULTS Our browser-based tool includes features which can help address many of the barriers towards Schema.org-compliance such as: The ability to easily browse for relevant Schema.org classes, the ability to extend and customize a class to be more suitable for biomedical research outputs, the ability to create data validation to ensure adherence of a research output to a customized class, and the ability to register a custom class to our schema registry enabling others to search and re-use it. We demonstrate the use of our tool with the creation of the Outbreak.info schema-a large multi-class schema for harmonizing various COVID-19 related resources. CONCLUSIONS We have created a browser-based tool to empower biomedical research resource providers to leverage Schema.org classes to make their research outputs more FAIR.
Collapse
Affiliation(s)
| | | | | | - Jiwen Xin
- The Scripps Research Institute, San Diego, USA
| | | | | | - Andrew I Su
- The Scripps Research Institute, San Diego, USA
| | - Chunlei Wu
- The Scripps Research Institute, San Diego, USA
| |
Collapse
|
3
|
Hughes LD, Tsueng G, DiGiovanna J, Horvath TD, Rasmussen LV, Savidge TC, Stoeger T, Turkarslan S, Wu Q, Wu C, Su AI, Pache L. Addressing barriers in FAIR data practices for biomedical data. Sci Data 2023; 10:98. [PMID: 36823198 PMCID: PMC9950056 DOI: 10.1038/s41597-023-01969-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 01/13/2023] [Indexed: 02/25/2023] Open
Affiliation(s)
- Laura D Hughes
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
| | - Ginger Tsueng
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Jack DiGiovanna
- Velsera, 529 Main St, Suite 6610, Charlestown, MA, 02129, USA
| | - Thomas D Horvath
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
- Texas Children's Microbiome Center, Department of Pathology, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Tor C Savidge
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
- Texas Children's Microbiome Center, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, McCormick School of Engineering, Evanston, IL, 60208, USA
| | | | - Qinglong Wu
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, 77030, USA
- Texas Children's Microbiome Center, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Chunlei Wu
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Scripps Research Translational Institute, La Jolla, CA, 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Andrew I Su
- Department of Integrative, Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Scripps Research Translational Institute, La Jolla, CA, 92037, USA
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, CA, 92037, USA
| | - Lars Pache
- Infectious and Inflammatory Disease Center, Immunity and Pathogenesis Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, 92037, USA
| |
Collapse
|