1
|
Mulero-Hernández J, Mironov V, Miñarro-Giménez JA, Kuiper M, Fernández-Breis J. Integration of chromosome locations and functional aspects of enhancers and topologically associating domains in knowledge graphs enables versatile queries about gene regulation. Nucleic Acids Res 2024; 52:e69. [PMID: 38967009 PMCID: PMC11347148 DOI: 10.1093/nar/gkae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 06/12/2024] [Accepted: 06/19/2024] [Indexed: 07/06/2024] Open
Abstract
Knowledge about transcription factor binding and regulation, target genes, cis-regulatory modules and topologically associating domains is not only defined by functional associations like biological processes or diseases but also has a determinative genome location aspect. Here, we exploit these location and functional aspects together to develop new strategies to enable advanced data querying. Many databases have been developed to provide information about enhancers, but a schema that allows the standardized representation of data, securing interoperability between resources, has been lacking. In this work, we use knowledge graphs for the standardized representation of enhancers and topologically associating domains, together with data about their target genes, transcription factors, location on the human genome, and functional data about diseases and gene ontology annotations. We used this schema to integrate twenty-five enhancer datasets and two domain datasets, creating the most powerful integrative resource in this field to date. The knowledge graphs have been implemented using the Resource Description Framework and integrated within the open-access BioGateway knowledge network, generating a resource that contains an interoperable set of knowledge graphs (enhancers, TADs, genes, proteins, diseases, GO terms, and interactions between domains). We show how advanced queries, which combine functional and location restrictions, can be used to develop new hypotheses about functional aspects of gene expression regulation.
Collapse
Affiliation(s)
- Juan Mulero-Hernández
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Vladimir Mironov
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - José Antonio Miñarro-Giménez
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - Jesualdo Tomás Fernández-Breis
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, Instituto Murciano de Investigación Biosanitaria (IMIB),30100 Murcia, Spain
| |
Collapse
|
2
|
Wenk EH, Sauquet H, Gallagher RV, Brownlee R, Boettiger C, Coleman D, Yang S, Auld T, Barrett R, Brodribb T, Choat B, Dun L, Ellsworth D, Gosper C, Guja L, Jordan GJ, Le Breton T, Leigh A, Lu-Irving P, Medlyn B, Nolan R, Ooi M, Sommerville KD, Vesk P, White M, Wright IJ, Falster DS. The AusTraits plant dictionary. Sci Data 2024; 11:537. [PMID: 38796535 PMCID: PMC11127939 DOI: 10.1038/s41597-024-03368-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 05/10/2024] [Indexed: 05/28/2024] Open
Abstract
Traits with intuitive names, a clear scope and explicit description are essential for all trait databases. The lack of unified, comprehensive, and machine-readable plant trait definitions limits the utility of trait databases, including reanalysis of data from a single database, or analyses that integrate data across multiple databases. Both can only occur if researchers are confident the trait concepts are consistent within and across sources. Here we describe the AusTraits Plant Dictionary (APD), a new data source of terms that extends the trait definitions included in a recent trait database, AusTraits. The development process of the APD included three steps: review and formalisation of the scope of each trait and the accompanying trait description; addition of trait metadata; and publication in both human and machine-readable forms. Trait definitions include keywords, references, and links to related trait concepts in other databases, enabling integration of AusTraits with other sources. The APD will both improve the usability of AusTraits and foster the integration of trait data across global and regional plant trait databases.
Collapse
Affiliation(s)
- Elizabeth H Wenk
- Evolution & Ecology Research Centre, University of New South Wales, Sydney, Australia.
| | - Hervé Sauquet
- Evolution & Ecology Research Centre, University of New South Wales, Sydney, Australia
- National Herbarium of NSW, Botanic Gardens of Sydney, Mount Annan, NSW, Australia
| | - Rachael V Gallagher
- Hawkesbury Institute for the Environment, Western Sydney University, Sydney, Australia
| | - Rowan Brownlee
- Australian Research Data Commons, Caulfield East, Australia
| | - Carl Boettiger
- Department of Environmental Science, Policy, & Management, University of California, Berkeley, USA
| | - David Coleman
- Evolution & Ecology Research Centre, University of New South Wales, Sydney, Australia
- School of Natural Sciences, Macquarie University, Macquarie Park, Australia
| | - Sophie Yang
- Evolution & Ecology Research Centre, University of New South Wales, Sydney, Australia
| | - Tony Auld
- NSW Department of Planning and Environment, Parramatta, Australia
- University of Wollongong, Wollongong, Australia
- Centre for Ecosystem Science, University of New South Wales, Syndey, Australia
| | - Russell Barrett
- Evolution & Ecology Research Centre, University of New South Wales, Sydney, Australia
- National Herbarium of NSW, Botanic Gardens of Sydney, Mount Annan, NSW, Australia
| | - Timothy Brodribb
- School of Biological Sciences, University of Tasmania, Hobart, Australia
| | - Brendan Choat
- Hawkesbury Institute for the Environment, Western Sydney University, Sydney, Australia
| | - Lily Dun
- Evolution & Ecology Research Centre, University of New South Wales, Sydney, Australia
- National Herbarium of NSW, Botanic Gardens of Sydney, Mount Annan, NSW, Australia
- Hawkesbury Institute for the Environment, Western Sydney University, Sydney, Australia
| | - David Ellsworth
- Hawkesbury Institute for the Environment, Western Sydney University, Sydney, Australia
| | - Carl Gosper
- Biodiversity and Conservation Science, Department of Biodiversity, Conservation and Attractions, Kensington, WA, Australia
| | - Lydia Guja
- Centre for Australian National Biodiversity Research, Canberra, Australia
- National Seed Bank, Australian National Botanic Gardens, Department of Climate Change, Energy, the Environment and Water, Canberra, Australia
| | - Gregory J Jordan
- School of Biological Sciences, University of Tasmania, Hobart, Australia
| | - Tom Le Breton
- Centre for Ecosystem Science, University of New South Wales, Syndey, Australia
| | - Andrea Leigh
- School of Life Sciences, University of Technology Sydney, Broadway, Australia
| | - Patricia Lu-Irving
- National Herbarium of NSW, Botanic Gardens of Sydney, Mount Annan, NSW, Australia
| | - Belinda Medlyn
- Hawkesbury Institute for the Environment, Western Sydney University, Sydney, Australia
| | - Rachael Nolan
- Hawkesbury Institute for the Environment, Western Sydney University, Sydney, Australia
| | - Mark Ooi
- Centre for Ecosystem Science, University of New South Wales, Syndey, Australia
| | | | - Peter Vesk
- School of Agriculture, Food and Ecosystem Sciences, University of Melbourne, Parkville, Australia
| | - Matthew White
- Arthur Rylah Institute for Environmental Research, Victorian Department of Energy, Environment and Climate Action, East Melbourne, Australia
| | - Ian J Wright
- Hawkesbury Institute for the Environment, Western Sydney University, Sydney, Australia
- School of Natural Sciences, Macquarie University, Macquarie Park, Australia
| | - Daniel S Falster
- Evolution & Ecology Research Centre, University of New South Wales, Sydney, Australia
| |
Collapse
|
3
|
Alper P, Dĕd V, Herzinger S, Grouès V, Peter S, Lebioda J, Ebermann L, Popleteeva M, Barry ND, Welter D, Ghosh S, Becker R, Schneider R, Gu W, Trefois C, Satagopam V. DS-PACK: Tool assembly for the end-to-end support of controlled access human data sharing. Sci Data 2024; 11:501. [PMID: 38750048 PMCID: PMC11096168 DOI: 10.1038/s41597-024-03326-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 04/29/2024] [Indexed: 05/18/2024] Open
Abstract
The EU General Data Protection Regulation (GDPR) requirements have prompted a shift from centralised controlled access genome-phenome archives to federated models for sharing sensitive human data. In a data-sharing federation, a central node facilitates data discovery; meanwhile, distributed nodes are responsible for handling data access requests, concluding agreements with data users and providing secure access to the data. Research institutions that want to become part of such federations often lack the resources to set up the required controlled access processes. The DS-PACK tool assembly is a reusable, open-source middleware solution that semi-automates controlled access processes end-to-end, from data submission to access. Data protection principles are engraved into all components of the DS-PACK assembly. DS-PACK centralises access control management and distributes access control enforcement with support for data access via cloud-based applications. DS-PACK is in production use at the ELIXIR Luxembourg data hosting platform, combined with an operational model including legal facilitation and data stewardship.
Collapse
Affiliation(s)
- Pinar Alper
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg.
- ELIXIR Luxembourg, Belvaux, Luxembourg.
| | - Vilém Dĕd
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Sascha Herzinger
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Valentin Grouès
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Sarah Peter
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Jacek Lebioda
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Linda Ebermann
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Marina Popleteeva
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Nene Djenaba Barry
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Danielle Welter
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Soumyabrata Ghosh
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Regina Becker
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Reinhard Schneider
- ELIXIR Luxembourg, Belvaux, Luxembourg
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Wei Gu
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Christophe Trefois
- Luxembourg National Data Service, PNED GIE, Esch-sur-Alzette, L-4362, Luxembourg
- ELIXIR Luxembourg, Belvaux, Luxembourg
| | - Venkata Satagopam
- ELIXIR Luxembourg, Belvaux, Luxembourg.
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg.
| |
Collapse
|
4
|
van Rijn JPM, Martens M, Ammar A, Cimpan MR, Fessard V, Hoet P, Jeliazkova N, Murugadoss S, Vinković Vrček I, Willighagen EL. From papers to RDF-based integration of physicochemical data and adverse outcome pathways for nanomaterials. J Cheminform 2024; 16:49. [PMID: 38693555 PMCID: PMC11064368 DOI: 10.1186/s13321-024-00833-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 03/23/2024] [Indexed: 05/03/2024] Open
Abstract
Adverse Outcome Pathways (AOPs) have been proposed to facilitate mechanistic understanding of interactions of chemicals/materials with biological systems. Each AOP starts with a molecular initiating event (MIE) and possibly ends with adverse outcome(s) (AOs) via a series of key events (KEs). So far, the interaction of engineered nanomaterials (ENMs) with biomolecules, biomembranes, cells, and biological structures, in general, is not yet fully elucidated. There is also a huge lack of information on which AOPs are ENMs-relevant or -specific, despite numerous published data on toxicological endpoints they trigger, such as oxidative stress and inflammation. We propose to integrate related data and knowledge recently collected. Our approach combines the annotation of nanomaterials and their MIEs with ontology annotation to demonstrate how we can then query AOPs and biological pathway information for these materials. We conclude that a FAIR (Findable, Accessible, Interoperable, Reusable) representation of the ENM-MIE knowledge simplifies integration with other knowledge. SCIENTIFIC CONTRIBUTION: This study introduces a new database linking nanomaterial stressors to the first known MIE or KE. Second, it presents a reproducible workflow to analyze and summarize this knowledge. Third, this work extends the use of semantic web technologies to the field of nanoinformatics and nanosafety.
Collapse
Affiliation(s)
- Jeaphianne P M van Rijn
- Dept of Bioinformatics, BiGCaT, NUTRIM, FHML, Maastricht University, Maastricht, The Netherlands
| | - Marvin Martens
- Dept of Bioinformatics, BiGCaT, NUTRIM, FHML, Maastricht University, Maastricht, The Netherlands
| | - Ammar Ammar
- Dept of Bioinformatics, BiGCaT, NUTRIM, FHML, Maastricht University, Maastricht, The Netherlands
| | - Mihaela Roxana Cimpan
- Department of Clinical Dentistry, Faculty of Medicine, University of Bergen, Bergen, Norway
| | - Valerie Fessard
- Fougères Laboratory, Anses, French Agency for Food, Environmental and Occupational Health and Safety, Toxicology of Contaminants Unit, Fougères, France
| | - Peter Hoet
- Laboratory of Toxicology, Unit of Environment and Health, Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
| | | | - Sivakumar Murugadoss
- Laboratory of Toxicology, Unit of Environment and Health, Department of Public Health and Primary Care, KU Leuven, Leuven, Belgium
- SD Chemical and Physical Health Risks, Brussels, Belgium
| | | | - Egon L Willighagen
- Dept of Bioinformatics, BiGCaT, NUTRIM, FHML, Maastricht University, Maastricht, The Netherlands.
| |
Collapse
|
5
|
Galgonek J, Vondrášek J. The IDSM mass spectrometry extension: searching mass spectra using SPARQL. Bioinformatics 2024; 40:btae174. [PMID: 38561173 PMCID: PMC11034985 DOI: 10.1093/bioinformatics/btae174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 02/24/2024] [Accepted: 03/28/2024] [Indexed: 04/04/2024] Open
Abstract
SUMMARY The Integrated Database of Small Molecules (IDSM) integrates data from small-molecule datasets, making them accessible through the SPARQL query language. Its unique feature is the ability to search for compounds through SPARQL based on their molecular structure. We extended IDSM to enable mass spectra databases to be integrated and searched for based on mass spectrum similarity. As sources of mass spectra, we employed the MassBank of North America database and the In Silico Spectral Database of natural products. AVAILABILITY AND IMPLEMENTATION The extension is an integral part of IDSM, which is available at https://idsm.elixir-czech.cz. The manual and usage examples are available at https://idsm.elixir-czech.cz/docs/ms. The source codes of all IDSM parts are available under open-source licences at https://github.com/idsm-src.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, Prague 160 00, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, Prague 160 00, Czech Republic
| |
Collapse
|
6
|
Niehues A, de Visser C, Hagenbeek FA, Kulkarni P, Pool R, Karu N, Kindt ASD, Singh G, Vermeiren RRJM, Boomsma DI, van Dongen J, ’t Hoen PAC, van Gool AJ. A multi-omics data analysis workflow packaged as a FAIR Digital Object. Gigascience 2024; 13:giad115. [PMID: 38217405 PMCID: PMC10787363 DOI: 10.1093/gigascience/giad115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 11/14/2023] [Accepted: 12/10/2023] [Indexed: 01/15/2024] Open
Abstract
BACKGROUND Applying good data management and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles in research projects can help disentangle knowledge discovery, study result reproducibility, and data reuse in future studies. Based on the concepts of the original FAIR principles for research data, FAIR principles for research software were recently proposed. FAIR Digital Objects enable discovery and reuse of Research Objects, including computational workflows for both humans and machines. Practical examples can help promote the adoption of FAIR practices for computational workflows in the research community. We developed a multi-omics data analysis workflow implementing FAIR practices to share it as a FAIR Digital Object. FINDINGS We conducted a case study investigating shared patterns between multi-omics data and childhood externalizing behavior. The analysis workflow was implemented as a modular pipeline in the workflow manager Nextflow, including containers with software dependencies. We adhered to software development practices like version control, documentation, and licensing. Finally, the workflow was described with rich semantic metadata, packaged as a Research Object Crate, and shared via WorkflowHub. CONCLUSIONS Along with the packaged multi-omics data analysis workflow, we share our experiences adopting various FAIR practices and creating a FAIR Digital Object. We hope our experiences can help other researchers who develop omics data analysis workflows to turn FAIR principles into practice.
Collapse
Affiliation(s)
- Anna Niehues
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
| | - Casper de Visser
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Fiona A Hagenbeek
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Purva Kulkarni
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
- Department of Human Genetics, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - René Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Naama Karu
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, 2333 AL Leiden, The Netherlands
| | - Alida S D Kindt
- Metabolomics and Analytics Centre, Leiden Academic Centre for Drug Research, Leiden University, 2333 AL Leiden, The Netherlands
| | - Gurnoor Singh
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Robert R J M Vermeiren
- Department of Child and Adolescent Psychiatry, LUMC-Curium, Leiden University Medical Center, 2342 AK Oegstgeest, The Netherlands
| | - Dorret I Boomsma
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Jenny van Dongen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, 1081 BT Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, 1081 BT Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, 1081 BT Amsterdam, The Netherlands
| | - Peter A C ’t Hoen
- Department of Medical BioSciences, Radboud University Medical Center, 6525 GA Nijmegen, The Netherlands
| | - Alain J van Gool
- Translational Metabolic Laboratory, Department of Laboratory Medicine, Radboud University Medical Center, 6525 GA Nijmegen, the Netherlands
| |
Collapse
|
7
|
Abad-Navarro F, Martínez-Costa C. A knowledge graph-based data harmonization framework for secondary data reuse. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 243:107918. [PMID: 37981455 DOI: 10.1016/j.cmpb.2023.107918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 10/02/2023] [Accepted: 11/05/2023] [Indexed: 11/21/2023]
Abstract
BACKGROUND AND OBJECTIVE The adoption of new technologies in clinical care systems has propitiated the availability of a great amount of valuable data. However, this data is usually heterogeneous, requiring its harmonization to be integrated and analysed. We propose a semantic-driven harmonization framework that (1) enables the meaningful sharing and integration of healthcare data across institutions and (2) facilitates the analysis and exploitation of the shared data. METHODS The framework includes an ontology-based common data model (i.e. SCDM), a data transformation pipeline and a semantic query system. Heterogeneous datasets, mapped to different terminologies, are integrated by using an ontology-based infrastructure rooted in a top-level ontology. A graph database is generated by using these mappings, and web-based semantic query system facilitates data exploration. RESULTS Several datasets from different European institutions have been integrated by using the framework in the context of the European H2020 Precise4Q project. Through the query system, data scientists were able to explore data and use it for building machine learning models. CONCLUSIONS The flexible data representation using RDF, together with the formal semantic underpinning provided by the SCDM, have enabled the semantic integration, query and advanced exploitation of heterogeneous data in the context of the Precise4Q project.
Collapse
Affiliation(s)
- Francisco Abad-Navarro
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, 30100, Murcia, Spain.
| | - Catalina Martínez-Costa
- Departamento de Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, 30100, Murcia, Spain.
| |
Collapse
|
8
|
Bernabé CH, Queralt-Rosinach N, Silva Souza VE, Bonino da Silva Santos LO, Mons B, Jacobsen A, Roos M. The use of foundational ontologies in biomedical research. J Biomed Semantics 2023; 14:21. [PMID: 38082345 PMCID: PMC10712036 DOI: 10.1186/s13326-023-00300-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND The FAIR principles recommend the use of controlled vocabularies, such as ontologies, to define data and metadata concepts. Ontologies are currently modelled following different approaches, sometimes describing conflicting definitions of the same concepts, which can affect interoperability. To cope with that, prior literature suggests organising ontologies in levels, where domain specific (low-level) ontologies are grounded in domain independent high-level ontologies (i.e., foundational ontologies). In this level-based organisation, foundational ontologies work as translators of intended meaning, thus improving interoperability. Despite their considerable acceptance in biomedical research, there are very few studies testing foundational ontologies. This paper describes a systematic literature mapping that was conducted to understand how foundational ontologies are used in biomedical research and to find empirical evidence supporting their claimed (dis)advantages. RESULTS From a set of 79 selected papers, we identified that foundational ontologies are used for several purposes: ontology construction, repair, mapping, and ontology-based data analysis. Foundational ontologies are claimed to improve interoperability, enhance reasoning, speed up ontology development and facilitate maintainability. The complexity of using foundational ontologies is the most commonly cited downside. Despite being used for several purposes, there were hardly any experiments (1 paper) testing the claims for or against the use of foundational ontologies. In the subset of 49 papers that describe the development of an ontology, it was observed a low adherence to ontology construction (16 papers) and ontology evaluation formal methods (4 papers). CONCLUSION Our findings have two main implications. First, the lack of empirical evidence about the use of foundational ontologies indicates a need for evaluating the use of such artefacts in biomedical research. Second, the low adherence to formal methods illustrates how the field could benefit from a more systematic approach when dealing with the development and evaluation of ontologies. The understanding of how foundational ontologies are used in the biomedical field can drive future research towards the improvement of ontologies and, consequently, data FAIRness. The adoption of formal methods can impact the quality and sustainability of ontologies, and reusing these methods from other fields is encouraged.
Collapse
Affiliation(s)
- César H Bernabé
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
| | | | | | - Luiz Olavo Bonino da Silva Santos
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- University of Twente, Enschede, The Netherlands
| | - Barend Mons
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Annika Jacobsen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Marco Roos
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
| |
Collapse
|
9
|
Seneviratne O, Das AK, Chari S, Agu NN, Rashid SM, McCusker J, Franklin JS, Qi M, Bennett KP, Chen CH, Hendler JA, McGuinness DL. Semantically enabling clinical decision support recommendations. J Biomed Semantics 2023; 14:8. [PMID: 37464259 DOI: 10.1186/s13326-023-00285-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 03/28/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND Clinical decision support systems have been widely deployed to guide healthcare decisions on patient diagnosis, treatment choices, and patient management through evidence-based recommendations. These recommendations are typically derived from clinical practice guidelines created by clinical specialties or healthcare organizations. Although there have been many different technical approaches to encoding guideline recommendations into decision support systems, much of the previous work has not focused on enabling system generated recommendations through the formalization of changes in a guideline, the provenance of a recommendation, and applicability of the evidence. Prior work indicates that healthcare providers may not find that guideline-derived recommendations always meet their needs for reasons such as lack of relevance, transparency, time pressure, and applicability to their clinical practice. RESULTS We introduce several semantic techniques that model diseases based on clinical practice guidelines, provenance of the guidelines, and the study cohorts they are based on to enhance the capabilities of clinical decision support systems. We have explored ways to enable clinical decision support systems with semantic technologies that can represent and link to details in related items from the scientific literature and quickly adapt to changing information from the guidelines, identifying gaps, and supporting personalized explanations. Previous semantics-driven clinical decision systems have limited support in all these aspects, and we present the ontologies and semantic web based software tools in three distinct areas that are unified using a standard set of ontologies and a custom-built knowledge graph framework: (i) guideline modeling to characterize diseases, (ii) guideline provenance to attach evidence to treatment decisions from authoritative sources, and (iii) study cohort modeling to identify relevant research publications for complicated patients. CONCLUSIONS We have enhanced existing, evidence-based knowledge by developing ontologies and software that enables clinicians to conveniently access updates to and provenance of guidelines, as well as gather additional information from research studies applicable to their patients' unique circumstances. Our software solutions leverage many well-used existing biomedical ontologies and build upon decades of knowledge representation and reasoning work, leading to explainable results.
Collapse
Affiliation(s)
| | | | - Shruthi Chari
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | | | - Sabbir M Rashid
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | - Jamie McCusker
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | - Jade S Franklin
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | - Miao Qi
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | | | | | - James A Hendler
- Rensselaer Polytechnic Institute, 110 8th St, 12180, Troy, NY, USA
| | | |
Collapse
|
10
|
Queder N, Tien VB, Abraham SA, Urchs SGW, Helmer KG, Chaplin D, van Erp TGM, Kennedy DN, Poline JB, Grethe JS, Ghosh SS, Keator DB. NIDM-Terms: community-based terminology management for improved neuroimaging dataset descriptions and query. Front Neuroinform 2023; 17:1174156. [PMID: 37533796 PMCID: PMC10392125 DOI: 10.3389/fninf.2023.1174156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 06/27/2023] [Indexed: 08/04/2023] Open
Abstract
The biomedical research community is motivated to share and reuse data from studies and projects by funding agencies and publishers. Effectively combining and reusing neuroimaging data from publicly available datasets, requires the capability to query across datasets in order to identify cohorts that match both neuroimaging and clinical/behavioral data criteria. Critical barriers to operationalizing such queries include, in part, the broad use of undefined study variables with limited or no annotations that make it difficult to understand the data available without significant interaction with the original authors. Using the Brain Imaging Data Structure (BIDS) to organize neuroimaging data has made querying across studies for specific image types possible at scale. However, in BIDS, beyond file naming and tightly controlled imaging directory structures, there are very few constraints on ancillary variable naming/meaning or experiment-specific metadata. In this work, we present NIDM-Terms, a set of user-friendly terminology management tools and associated software to better manage individual lab terminologies and help with annotating BIDS datasets. Using these tools to annotate BIDS data with a Neuroimaging Data Model (NIDM) semantic web representation, enables queries across datasets to identify cohorts with specific neuroimaging and clinical/behavioral measurements. This manuscript describes the overall informatics structures and demonstrates the use of tools to annotate BIDS datasets to perform integrated cross-cohort queries.
Collapse
Affiliation(s)
- Nazek Queder
- Department of Psychiatry and Human Behavior, School of Medicine, University of California, Irvine, Irvine, CA, United States
- Department of Neurobiology and Behavior and Center for the Neurobiology of Learning and Memory, University of California, Irvine, Irvine, CA, United States
| | - Vivian B. Tien
- Fairmont Preparatory Academy, Anaheim, CA, United States
| | - Sanu Ann Abraham
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Sebastian Georg Wenzel Urchs
- NeuroDataScience–ORIGAMI Laboratory, McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute-Hospital), Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Karl G. Helmer
- Massachusetts General Hospital, Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Derek Chaplin
- Massachusetts General Hospital, Boston, MA, United States
| | - Theo G. M. van Erp
- Clinical Translational Neuroscience Laboratory, Department of Psychiatry and Human Behavior, School of Medicine, University of California, Irvine, Irvine, CA, United States
- Center for the Neurobiology of Learning and Memory, University of California, Irvine, Irvine, CA, United States
| | - David N. Kennedy
- Departments of Psychiatry and Radiology, University of Massachusetts Chan Medical School, Worcester, MA, United States
| | - Jean-Baptiste Poline
- NeuroDataScience–ORIGAMI Laboratory, McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute-Hospital), Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Jeffrey S. Grethe
- Department of Neurosciences, School of Medicine, University of California, San Diego, San Diego, CA, United States
| | - Satrajit S. Ghosh
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - David B. Keator
- Department of Psychiatry and Human Behavior, School of Medicine, University of California, Irvine, Irvine, CA, United States
| |
Collapse
|
11
|
Zhang S, Benis N, Cornet R. Automated approach for quality assessment of RDF resources. BMC Med Inform Decis Mak 2023; 23:90. [PMID: 37165363 PMCID: PMC10170671 DOI: 10.1186/s12911-023-02182-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/20/2023] [Indexed: 05/12/2023] Open
Abstract
INTRODUCTION The Semantic Web community provides a common Resource Description Framework (RDF) that allows representation of resources such that they can be linked. To maximize the potential of linked data - machine-actionable interlinked resources on the Web - a certain level of quality of RDF resources should be established, particularly in the biomedical domain in which concepts are complex and high-quality biomedical ontologies are in high demand. However, it is unclear which quality metrics for RDF resources exist that can be automated, which is required given the multitude of RDF resources. Therefore, we aim to determine these metrics and demonstrate an automated approach to assess such metrics of RDF resources. METHODS An initial set of metrics are identified through literature, standards, and existing tooling. Of these, metrics are selected that fulfil these criteria: (1) objective; (2) automatable; and (3) foundational. Selected metrics are represented in RDF and semantically aligned to existing standards. These metrics are then implemented in an open-source tool. To demonstrate the tool, eight commonly used RDF resources were assessed, including data models in the healthcare domain (HL7 RIM, HL7 FHIR, CDISC CDASH), ontologies (DCT, SIO, FOAF, ORDO), and a metadata profile (GRDDL). RESULTS Six objective metrics are identified in 3 categories: Resolvability (1), Parsability (1), and Consistency (4), and represented in RDF. The tool demonstrates that these metrics can be automated, and application in the healthcare domain shows non-resolvable URIs (ranging from 0.3% to 97%) among all eight resources and undefined URIs in HL7 RIM, and FHIR. In the tested resources no errors were found for parsability and the other three consistency metrics for correct usage of classes and properties. CONCLUSION We extracted six objective and automatable metrics from literature, as the foundational quality requirements of RDF resources to maximize the potential of linked data. Automated tooling to assess resources has shown to be effective to identify quality issues that must be avoided. This approach can be expanded to incorporate more automatable metrics so as to reflect additional quality dimensions with the assessment tool implementing more metrics.
Collapse
Affiliation(s)
- Shuxin Zhang
- Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
- Amsterdam Public Health, Methodology & Digital Health, Amsterdam, The Netherlands
| | - Nirupama Benis
- Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
- Amsterdam Public Health, Methodology & Digital Health, Amsterdam, The Netherlands
| | - Ronald Cornet
- Department of Medical Informatics, Amsterdam UMC location University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
- Amsterdam Public Health, Methodology & Digital Health, Amsterdam, The Netherlands
| |
Collapse
|
12
|
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE. PubChem 2023 update. Nucleic Acids Res 2022; 51:D1373-D1380. [PMID: 36305812 PMCID: PMC9825602 DOI: 10.1093/nar/gkac956] [Citation(s) in RCA: 655] [Impact Index Per Article: 327.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/06/2022] [Accepted: 10/13/2022] [Indexed: 01/30/2023] Open
Abstract
PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves a wide range of use cases. In the past two years, a number of changes were made to PubChem. Data from more than 120 data sources was added to PubChem. Some major highlights include: the integration of Google Patents data into PubChem, which greatly expanded the coverage of the PubChem Patent data collection; the creation of the Cell Line and Taxonomy data collections, which provide quick and easy access to chemical information for a given cell line and taxon, respectively; and the update of the bioassay data model. In addition, new functionalities were added to the PubChem programmatic access protocols, PUG-REST and PUG-View, including support for target-centric data download for a given protein, gene, pathway, cell line, and taxon and the addition of the 'standardize' option to PUG-REST, which returns the standardized form of an input chemical structure. A significant update was also made to PubChemRDF. The present paper provides an overview of these changes.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jie Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Asta Gindulyte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jia He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Siqian He
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Qingliang Li
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Benjamin A Shoemaker
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Paul A Thiessen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Bo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Jian Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Bethesda, MD, 20894, USA
| | - Evan E Bolton
- To whom correspondence should be addressed. Tel: +1 301 451 1811; Fax: +1 301 480 4559;
| |
Collapse
|
13
|
Wood EC, Glen AK, Kvarfordt LG, Womack F, Acevedo L, Yoon TS, Ma C, Flores V, Sinha M, Chodpathumwan Y, Termehchy A, Roach JC, Mendoza L, Hoffman AS, Deutsch EW, Koslicki D, Ramsey SA. RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine. BMC Bioinformatics 2022; 23:400. [PMID: 36175836 PMCID: PMC9520835 DOI: 10.1186/s12859-022-04932-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 09/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biomedical translational science is increasingly using computational reasoning on repositories of structured knowledge (such as UMLS, SemMedDB, ChEMBL, Reactome, DrugBank, and SMPDB in order to facilitate discovery of new therapeutic targets and modalities. The NCATS Biomedical Data Translator project is working to federate autonomous reasoning agents and knowledge providers within a distributed system for answering translational questions. Within that project and the broader field, there is a need for a framework that can efficiently and reproducibly build an integrated, standards-compliant, and comprehensive biomedical knowledge graph that can be downloaded in standard serialized form or queried via a public application programming interface (API). RESULTS To create a knowledge provider system within the Translator project, we have developed RTX-KG2, an open-source software system for building-and hosting a web API for querying-a biomedical knowledge graph that uses an Extract-Transform-Load approach to integrate 70 knowledge sources (including the aforementioned core six sources) into a knowledge graph with provenance information including (where available) citations. The semantic layer and schema for RTX-KG2 follow the standard Biolink model to maximize interoperability. RTX-KG2 is currently being used by multiple Translator reasoning agents, both in its downloadable form and via its SmartAPI-registered interface. Serializations of RTX-KG2 are available for download in both the pre-canonicalized form and in canonicalized form (in which synonyms are merged). The current canonicalized version (KG2.7.3) of RTX-KG2 contains 6.4M nodes and 39.3M edges with a hierarchy of 77 relationship types from Biolink. CONCLUSION RTX-KG2 is the first knowledge graph that integrates UMLS, SemMedDB, ChEMBL, DrugBank, Reactome, SMPDB, and 64 additional knowledge sources within a knowledge graph that conforms to the Biolink standard for its semantic layer and schema. RTX-KG2 is publicly available for querying via its API at arax.rtx.ai/api/rtxkg2/v1.2/openapi.json . The code to build RTX-KG2 is publicly available at github:RTXteam/RTX-KG2 .
Collapse
Affiliation(s)
- E C Wood
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Amy K Glen
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA.
| | - Lindsey G Kvarfordt
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Finn Womack
- Computer Science and Engineering, Penn State University, State College, PA, USA
| | - Liliana Acevedo
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Timothy S Yoon
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Chunyu Ma
- Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
| | - Veronica Flores
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | - Meghamala Sinha
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | | | - Arash Termehchy
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| | | | | | - Andrew S Hoffman
- Interdisciplinary Hub for Digitalization and Society, Radboud University, Nijmegen, The Netherlands
| | | | - David Koslicki
- Computer Science and Engineering, Penn State University, State College, PA, USA
- Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
- Department of Biology, Penn State University, State College, PA, USA
| | - Stephen A Ramsey
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
- Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
14
|
The Representation of Causality and Causation with Ontologies: A Systematic Literature Review. Online J Public Health Inform 2022; 14:e4. [PMID: 36120162 PMCID: PMC9473331 DOI: 10.5210/ojphi.v14i1.12577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Objective To explore how disease-related causality is formally represented in current ontologies and identify their potential limitations. Methods We conducted a systematic literature search on eight databases (PubMed, Institute of Electrical and Electronic Engendering (IEEE Xplore), Association for Computing Machinery (ACM), Scopus, Web of Science databases, Ontobee, OBO Foundry, and Bioportal. We included studies published between January 1, 1970, and December 9, 2020, that formally represent the notions of causality and causation in the medical domain using ontology as a representational tool. Further inclusion criteria were publication in English and peer-reviewed journals or conference proceedings. Two authors (SS, RM) independently assessed study quality and performed content analysis using a modified validated extraction grid with pre-established categorization. Results The search strategy led to a total of 8,501 potentially relevant papers, of which 50 met the inclusion criteria. Only 14 out of 50 (28%) specified the nature of causation, and only 7 (14%) included clear and non-circular natural language definitions. Although several theories of causality were mentioned, none of the articles offers a widely accepted conceptualization of how causation and causality can be formally represented. Conclusion No current ontology captures the wealth of available concepts of causality. This provides an opportunity for the development of a formal ontology of causation/causality.
Collapse
|
15
|
Unni DR, Moxon SAT, Bada M, Brush M, Bruskiewich R, Caufield JH, Clemons PA, Dancik V, Dumontier M, Fecho K, Glusman G, Hadlock JJ, Harris NL, Joshi A, Putman T, Qin G, Ramsey SA, Shefchek KA, Solbrig H, Soman K, Thessen AE, Haendel MA, Bizon C, Mungall CJ. Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science. Clin Transl Sci 2022; 15:1848-1855. [PMID: 36125173 PMCID: PMC9372416 DOI: 10.1111/cts.13302] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/27/2022] [Accepted: 05/02/2022] [Indexed: 12/12/2022] Open
Abstract
Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge graphs" (KGs) has remained difficult. Data set heterogeneity and complexity; the proliferation of ad hoc data formats; poor compliance with guidelines on findability, accessibility, interoperability, and reusability; and, in particular, the lack of a universally accepted, open-access model for standardization across biomedical KGs has left the task of reconciling data sources to downstream consumers. Biolink Model is an open-source data model that can be used to formalize the relationships between data structures in translational science. It incorporates object-oriented classification and graph-oriented features. The core of the model is a set of hierarchical, interconnected classes (or categories) and relationships between them (or predicates) representing biomedical entities such as gene, disease, chemical, anatomic structure, and phenotype. The model provides class and edge attributes and associations that guide how entities should relate to one another. Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. We demonstrate the utility of Biolink Model in various initiatives, including the Biomedical Data Translator Consortium and the Monarch Initiative, and show how it has supported easier integration and interoperability of biomedical KGs, bringing together knowledge from multiple sources and helping to realize the goals of translational science.
Collapse
Grants
- OT3TR002019 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003445 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003449 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR002515 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR002584 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003434 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- RM1 HG010860 NHGRI NIH HHS
- OT2TR003433 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003435 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR002517 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT3TR002027 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003422 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003441 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT3TR002020 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT3 TR002019 NCATS NIH HHS
- OT2TR003448 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003428 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR002520 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003427 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003436 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR002514 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- R24 OD011883 NIH HHS
- OT2TR003443 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT3TR002025 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2 TR003428 NCATS NIH HHS
- OT2TR003437 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003450 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT3TR002026 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- OT2TR003430 National Center for Advancing Translational Sciences, Biomedical Data Translator Program
- U.S. Department of Energy
- National Human Genome Research Institute
- National Institutes of Health
Collapse
Affiliation(s)
- Deepak R. Unni
- Genome Biology Unit, European Molecular Biology LaboratoryHeidelbergGermany
- Division of Environmental Genomics and Systems BiologyLawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | - Sierra A. T. Moxon
- Division of Environmental Genomics and Systems BiologyLawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | - Michael Bada
- Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Matthew Brush
- Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | | | - J. Harry Caufield
- Division of Environmental Genomics and Systems BiologyLawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | - Paul A. Clemons
- Chemical Biology and Therapeutics Science ProgramBroad InstituteCambridgeMassachusettsUSA
| | - Vlado Dancik
- Chemical Biology and Therapeutics Science ProgramBroad InstituteCambridgeMassachusettsUSA
| | - Michel Dumontier
- Institute of Data ScienceMaastricht UniversityMaastrichtThe Netherlands
| | - Karamarie Fecho
- Renaissance Computing InstituteUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | | | | | - Nomi L. Harris
- Division of Environmental Genomics and Systems BiologyLawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | - Arpita Joshi
- Institute for Systems BiologySeattleWashingtonUSA
| | - Tim Putman
- Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | | | - Stephen A. Ramsey
- Department of Biomedical SciencesOregon State UniversityCorvallisOregonUSA
| | - Kent A. Shefchek
- Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | | | - Karthik Soman
- Department of NeurologyUniversity of California San FranciscoSan FranciscoCaliforniaUSA
| | - Anne E. Thessen
- Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Melissa A. Haendel
- Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Chris Bizon
- Renaissance Computing InstituteUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems BiologyLawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | | |
Collapse
|
16
|
Blagec K, Barbosa-Silva A, Ott S, Samwald M. A curated, ontology-based, large-scale knowledge graph of artificial intelligence tasks and benchmarks. Sci Data 2022; 9:322. [PMID: 35715466 PMCID: PMC9205953 DOI: 10.1038/s41597-022-01435-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 05/30/2022] [Indexed: 11/22/2022] Open
Abstract
Research in artificial intelligence (AI) is addressing a growing number of tasks through a rapidly growing number of models and methodologies. This makes it difficult to keep track of where novel AI methods are successfully - or still unsuccessfully - applied, how progress is measured, how different advances might synergize with each other, and how future research should be prioritized. To help address these issues, we created the Intelligence Task Ontology and Knowledge Graph (ITO), a comprehensive, richly structured and manually curated resource on artificial intelligence tasks, benchmark results and performance metrics. The current version of ITO contains 685,560 edges, 1,100 classes representing AI processes and 1,995 properties representing performance metrics. The primary goal of ITO is to enable analyses of the global landscape of AI tasks and capabilities. ITO is based on technologies that allow for easy integration and enrichment with external data, automated inference and continuous, collaborative expert curation of underlying ontological models. We make the ITO dataset and a collection of Jupyter notebooks utilizing ITO openly available.
Collapse
Affiliation(s)
- Kathrin Blagec
- Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Institute of Artificial Intelligence, Vienna, Austria
| | - Adriano Barbosa-Silva
- Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Institute of Artificial Intelligence, Vienna, Austria
| | - Simon Ott
- Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Institute of Artificial Intelligence, Vienna, Austria
| | - Matthias Samwald
- Medical University of Vienna, Center for Medical Statistics, Informatics and Intelligent Systems, Institute of Artificial Intelligence, Vienna, Austria.
| |
Collapse
|
17
|
Deagen ME, McCusker JP, Fateye T, Stouffer S, Brinson LC, McGuinness DL, Schadler LS. FAIR and Interactive Data Graphics from a Scientific Knowledge Graph. Sci Data 2022; 9:239. [PMID: 35624233 PMCID: PMC9142568 DOI: 10.1038/s41597-022-01352-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 04/26/2022] [Indexed: 11/16/2022] Open
Abstract
Graph databases capture richly linked domain knowledge by integrating heterogeneous data and metadata into a unified representation. Here, we present the use of bespoke, interactive data graphics (bar charts, scatter plots, etc.) for visual exploration of a knowledge graph. By modeling a chart as a set of metadata that describes semantic context (SPARQL query) separately from visual context (Vega-Lite specification), we leverage the high-level, declarative nature of the SPARQL and Vega-Lite grammars to concisely specify web-based, interactive data graphics synchronized to a knowledge graph. Resources with dereferenceable URIs (uniform resource identifiers) can employ the hyperlink encoding channel or image marks in Vega-Lite to amplify the information content of a given data graphic, and published charts populate a browsable gallery of the database. We discuss design considerations that arise in relation to portability, persistence, and performance. Altogether, this pairing of SPARQL and Vega-Lite-demonstrated here in the domain of polymer nanocomposite materials science-offers an extensible approach to FAIR (findable, accessible, interoperable, reusable) scientific data visualization within a knowledge graph framework.
Collapse
Affiliation(s)
- Michael E Deagen
- Department of Mechanical Engineering, University of Vermont, Burlington, VT, USA.
| | - Jamie P McCusker
- Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Tolulomo Fateye
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | - Samuel Stouffer
- Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - L Cate Brinson
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC, USA
| | | | - Linda S Schadler
- Department of Mechanical Engineering, University of Vermont, Burlington, VT, USA
| |
Collapse
|
18
|
Umberfield EE, Stansbury C, Ford K, Jiang Y, Kardia SLR, Thomer AK, Harris MR. Evaluating and Extending the Informed Consent Ontology for Representing Permissions from the Clinical Domain. APPLIED ONTOLOGY 2022; 17:321-336. [PMID: 36312514 PMCID: PMC9616177 DOI: 10.3233/ao-210260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The purpose of this study was to evaluate, revise, and extend the Informed Consent Ontology (ICO) for expressing clinical permissions, including reuse of residual clinical biospecimens and health data. This study followed a formative evaluation design and used a bottom-up modeling approach. Data were collected from the literature on US federal regulations and a study of clinical consent forms. Eleven federal regulations and fifteen permission-sentences from clinical consent forms were iteratively modeled to identify entities and their relationships, followed by community reflection and negotiation based on a series of predetermined evaluation questions. ICO included fifty-two classes and twelve object properties necessary when modeling, demonstrating appropriateness of extending ICO for the clinical domain. Twenty-six additional classes were imported into ICO from other ontologies, and twelve new classes were recommended for development. This work addresses a critical gap in formally representing permissions clinical permissions, including reuse of residual clinical biospecimens and health data. It makes missing content available to the OBO Foundry, enabling use alongside other widely-adopted biomedical ontologies. ICO serves as a machine-interpretable and interoperable tool for responsible reuse of residual clinical biospecimens and health data at scale.
Collapse
Affiliation(s)
- Elizabeth E. Umberfield
- Indiana University Richard M Fairbanks School of Public Health, Health Policy & Management; Indianapolis, IN, USA
- Regenstrief Institute Inc, Center for Biomedical Informatics, Indianapolis, IN, USA
| | - Cooper Stansbury
- University of Michigan Medical School, Computational Medicine and Bioinformatics; Ann Arbor, MI, USA
- University of Michigan, Institute for Computational Discovery & Engineering; Ann Arbor, MI, USA
| | | | - Yun Jiang
- University of Michigan School of Nursing, Systems, Populations and Leadership; Ann Arbor, MI, USA
| | - Sharon L. R. Kardia
- University of Michigan School of Public Health, Epidemiology; Ann Arbor, MI, USA
| | - Andrea K. Thomer
- University of Michigan School of Information, Ann Arbor, MI, USA
| | - Marcelline R. Harris
- University of Michigan School of Nursing, Systems, Populations and Leadership; Ann Arbor, MI, USA
| |
Collapse
|
19
|
van der Velde KJ, Singh G, Kaliyaperumal R, Liao X, de Ridder S, Rebers S, Kerstens HHD, de Andrade F, van Reeuwijk J, De Gruyter FE, Hiltemann S, Ligtvoet M, Weiss MM, van Deutekom HWM, Jansen AML, Stubbs AP, Vissers LELM, Laros JFJ, van Enckevort E, Stemkens D, 't Hoen PAC, Beliën JAM, van Gijn ME, Swertz MA. FAIR Genomes metadata schema promoting Next Generation Sequencing data reuse in Dutch healthcare and research. Sci Data 2022; 9:169. [PMID: 35418585 PMCID: PMC9008059 DOI: 10.1038/s41597-022-01265-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 03/25/2022] [Indexed: 11/08/2022] Open
Abstract
The genomes of thousands of individuals are profiled within Dutch healthcare and research each year. However, this valuable genomic data, associated clinical data and consent are captured in different ways and stored across many systems and organizations. This makes it difficult to discover rare disease patients, reuse data for personalized medicine and establish research cohorts based on specific parameters. FAIR Genomes aims to enable NGS data reuse by developing metadata standards for the data descriptions needed to FAIRify genomic data while also addressing ELSI issues. We developed a semantic schema of essential data elements harmonized with international FAIR initiatives. The FAIR Genomes schema v1.1 contains 110 elements in 9 modules. It reuses common ontologies such as NCIT, DUO and EDAM, only introducing new terms when necessary. The schema is represented by a YAML file that can be transformed into templates for data entry software (EDC) and programmatic interfaces (JSON, RDF) to ease genomic data sharing in research and healthcare. The schema, documentation and MOLGENIS reference implementation are available at https://fairgenomes.org .
Collapse
Affiliation(s)
- K Joeri van der Velde
- University of Groningen and University Medical Center Groningen, Genomics Coordination Center, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- University of Groningen and University Medical Center Groningen, Department of Genetics, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Gurnoor Singh
- Radboud University Medical Center, Radboud Institute for Molecular Life Sciences, Center for Molecular and Biomolecular Informatics, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands
| | - Rajaram Kaliyaperumal
- Leiden University Medical Center, Department of Human Genetics, Einthovenweg 20, 2333 ZC, Leiden, The Netherlands
| | - XiaoFeng Liao
- Radboud University Medical Center, Radboud Institute for Molecular Life Sciences, Center for Molecular and Biomolecular Informatics, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands
| | - Sander de Ridder
- Amsterdam University Medical Center, University of Amsterdam, Department of Pathology, Meibergdreef 9, 1105 AZ, Amsterdam, The Netherlands
| | - Susanne Rebers
- The Netherlands Cancer Institute, Division of Molecular Pathology, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands
| | - Hindrik H D Kerstens
- Prinses Máxima Center for Pediatric Oncology, Kemmeren group, Heidelberglaan 25, 3584 CS, Utrecht, The Netherlands
| | - Fernanda de Andrade
- University of Groningen and University Medical Center Groningen, Genomics Coordination Center, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Jeroen van Reeuwijk
- Radboud University Medical Center, Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands
| | - Fini E De Gruyter
- University Medical Center Utrecht, Department of Genetics, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - Saskia Hiltemann
- Erasmus Medical Center, Department of Pathology, Doctor Molewaterplein 40, 3015 GD, Rotterdam, The Netherlands
| | - Maarten Ligtvoet
- Nictiz - Dutch competence centre for electronic exchange of health and care information, Oude Middenweg 55, 2491 AC, The Hague, The Netherlands
| | - Marjan M Weiss
- Radboud University Medical Center, Department of Human Genetics, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands
| | - Hanneke W M van Deutekom
- University Medical Center Utrecht, Department of Genetics, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - Anne M L Jansen
- University Medical Center Utrecht, Department of Pathology, Heidelberglaan 100, 3584 CX, Utrecht, The Netherlands
| | - Andrew P Stubbs
- Erasmus Medical Center, Department of Pathology, Doctor Molewaterplein 40, 3015 GD, Rotterdam, The Netherlands
| | - Lisenka E L M Vissers
- Radboud University Medical Center, Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands
| | - Jeroen F J Laros
- Leiden University Medical Center, Department of Human Genetics, Einthovenweg 20, 2333 ZC, Leiden, The Netherlands
- Leiden University Medical Center, Department of Clinical Genetics, Einthovenweg 20, 2333 ZC, Leiden, The Netherlands
- Rijksinstituut voor Volksgezondheid en Milieu, Antonie van Leeuwenhoeklaan 9, 3721 MA, Bilthoven, The Netherlands
| | - Esther van Enckevort
- University of Groningen and University Medical Center Groningen, Genomics Coordination Center, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Daphne Stemkens
- VSOP - Patient Alliance for Rare and Genetic Diseases The Netherlands, Koninginnelaan 23, 3762 DA, Soest, The Netherlands
| | - Peter A C 't Hoen
- Radboud University Medical Center, Radboud Institute for Molecular Life Sciences, Center for Molecular and Biomolecular Informatics, Geert Grooteplein 28, 6525 GA, Nijmegen, The Netherlands
| | - Jeroen A M Beliën
- Amsterdam University Medical Center, Vrije Universiteit Amsterdam, Department of Pathology, De Boelelaan 1117, 1081 HV, Amsterdam, The Netherlands
| | - Mariëlle E van Gijn
- University of Groningen and University Medical Center Groningen, Department of Genetics, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Morris A Swertz
- University of Groningen and University Medical Center Groningen, Genomics Coordination Center, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
- University of Groningen and University Medical Center Groningen, Department of Genetics, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
| |
Collapse
|
20
|
Marchesin S, Silvello G. TBGA: a large-scale Gene-Disease Association dataset for Biomedical Relation Extraction. BMC Bioinformatics 2022; 23:111. [PMID: 35361129 PMCID: PMC8973894 DOI: 10.1186/s12859-022-04646-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 03/22/2022] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Databases are fundamental to advance biomedical science. However, most of them are populated and updated with a great deal of human effort. Biomedical Relation Extraction (BioRE) aims to shift this burden to machines. Among its different applications, the discovery of Gene-Disease Associations (GDAs) is one of BioRE most relevant tasks. Nevertheless, few resources have been developed to train models for GDA extraction. Besides, these resources are all limited in size-preventing models from scaling effectively to large amounts of data. RESULTS To overcome this limitation, we have exploited the DisGeNET database to build a large-scale, semi-automatically annotated dataset for GDA extraction. DisGeNET stores one of the largest available collections of genes and variants involved in human diseases. Relying on DisGeNET, we developed TBGA: a GDA extraction dataset generated from more than 700K publications that consists of over 200K instances and 100K gene-disease pairs. Each instance consists of the sentence from which the GDA was extracted, the corresponding GDA, and the information about the gene-disease pair. CONCLUSIONS TBGA is amongst the largest datasets for GDA extraction. We have evaluated state-of-the-art models for GDA extraction on TBGA, showing that it is a challenging and well-suited dataset for the task. We made the dataset publicly available to foster the development of state-of-the-art BioRE models for GDA extraction.
Collapse
Affiliation(s)
- Stefano Marchesin
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Gianmaria Silvello
- Department of Information Engineering, University of Padova, Padova, Italy
| |
Collapse
|
21
|
Strömert P, Hunold J, Castro A, Neumann S, Koepler O. Ontologies4Chem: the landscape of ontologies in chemistry. PURE APPL CHEM 2022. [DOI: 10.1515/pac-2021-2007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
For a long time, databases such as CAS, Reaxys, PubChem or ChemSpider mostly rely on unique numerical identifiers or chemical structure identifiers like InChI, SMILES or others to link data across heterogeneous data sources. The retrospective processing of information and fragmented data from text publications to maintain these databases is a cumbersome process. Ontologies are a holistic approach to semantically describe data, information and knowledge of a domain. They provide terms, relations and logic to semantically annotate and link data building knowledge graphs. The application of standard taxonomies and vocabularies from the very beginning of data generation and along research workflows in electronic lab notebooks (ELNs), software tools, and their final publication in data repositories create FAIR data straightforwardly. Thus a proper semantic description of an investigation and the why, how, where, when, and by whom data was produced in conjunction with the description and representation of research data is a natural outcome in contrast to the retrospective processing of research publications as we know it. In this work we provide an overview of ontologies in chemistry suitable to represent concepts of research and research data. These ontologies are evaluated against several criteria derived from the FAIR data principles and their possible application in the digitisation of research data management workflows.
Collapse
Affiliation(s)
- Philip Strömert
- TIB – Leibniz Information Centre for Science and Technology , Welfengarten 1 B, 30167 Hannover , Germany
| | - Johannes Hunold
- TIB – Leibniz Information Centre for Science and Technology , Welfengarten 1 B, 30167 Hannover , Germany
| | - André Castro
- TIB – Leibniz Information Centre for Science and Technology , Welfengarten 1 B, 30167 Hannover , Germany
| | - Steffen Neumann
- Leibniz Institute of Plant Biochemistry , Weinberg 3 , 06120 Halle , Germany
| | - Oliver Koepler
- TIB – Leibniz Information Centre for Science and Technology , Welfengarten 1 B, 30167 Hannover , Germany
| |
Collapse
|
22
|
Kaliyaperumal R, Wilkinson MD, Moreno PA, Benis N, Cornet R, Dos Santos Vieira B, Dumontier M, Bernabé CH, Jacobsen A, Le Cornec CMA, Godoy MP, Queralt-Rosinach N, Schultze Kool LJ, Swertz MA, van Damme P, van der Velde KJ, Lalout N, Zhang S, Roos M. Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data. J Biomed Semantics 2022; 13:9. [PMID: 35292119 PMCID: PMC8922780 DOI: 10.1186/s13326-022-00264-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Accepted: 02/23/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. RESULTS Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. CONCLUSIONS Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them.
Collapse
Affiliation(s)
| | - Mark D Wilkinson
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid (UPM), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Pozuelo de Alarcón, Madrid, ES, Spain.
| | - Pablo Alarcón Moreno
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid (UPM), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Pozuelo de Alarcón, Madrid, ES, Spain
| | - Nirupama Benis
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
| | - Ronald Cornet
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
| | - Bruna Dos Santos Vieira
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands.,Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Michel Dumontier
- Institute of Data Science, Paul-Henri Spaaklaan 1, Maastricht University, 6229EN, Maastricht, The Netherlands
| | | | | | - Clémence M A Le Cornec
- Division of Paediatric Nephrology, Centre for Paediatrics and Adolescent Medicine, University of Heidelberg, Heidelberg, Germany
| | - Mario Prieto Godoy
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Centro de Biotecnología y Genómica de Plantas (CBGP), Universidad Politécnica de Madrid (UPM), Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Pozuelo de Alarcón, Madrid, ES, Spain
| | | | - Leo J Schultze Kool
- Department of Medical Imaging, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Morris A Swertz
- University of Groningen and University Medical Center Groningen, Genomics Coordination Center and Department of Genetics, Antonius Deusinglaan 1, 9713, AV, Groningen, The Netherlands
| | - Philip van Damme
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
| | - K Joeri van der Velde
- University of Groningen and University Medical Center Groningen, Genomics Coordination Center and Department of Genetics, Antonius Deusinglaan 1, 9713, AV, Groningen, The Netherlands
| | - Nawel Lalout
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands.,Duchenne Parent Project, Veenendaal, The Netherlands
| | - Shuxin Zhang
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
| | - Marco Roos
- Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
23
|
Mortensen HM, Martens M, Senn J, Levey T, Evelo CT, Willighagen EL, Exner T. The AOP-DB RDF: Applying FAIR Principles to the Semantic Integration of AOP Data Using the Research Description Framework. FRONTIERS IN TOXICOLOGY 2022; 4:803983. [PMID: 35295213 PMCID: PMC8915825 DOI: 10.3389/ftox.2022.803983] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 01/13/2022] [Indexed: 01/12/2023] Open
Abstract
Computational toxicology is central to the current transformation occurring in toxicology and chemical risk assessment. There is a need for more efficient use of existing data to characterize human toxicological response data for environmental chemicals in the US and Europe. The Adverse Outcome Pathway (AOP) framework helps to organize existing mechanistic information and contributes to what is currently being described as New Approach Methodologies (NAMs). AOP knowledge and data are currently submitted directly by users and stored in the AOP-Wiki (https://aopwiki.org/). Automatic and systematic parsing of AOP-Wiki data is challenging, so we have created the EPA Adverse Outcome Pathway Database. The AOP-DB, developed by the US EPA to assist in the biological and mechanistic characterization of AOP data, provides a broad, systems-level overview of the biological context of AOPs. Here we describe the recent semantic mapping efforts for the AOP-DB, and how this process facilitates the integration of AOP-DB data with other toxicologically relevant datasets through a use case example.
Collapse
Affiliation(s)
- Holly M. Mortensen
- United States Environmental Protection Agency, Office of Research and Development, Center for Public Health and Environmental Assessment, Research Triangle Park, Durham, NC, United States
| | - Marvin Martens
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, Netherlands
| | - Jonathan Senn
- Oak Ridge Associated Universities, Oak Ridge, TN, United States
| | - Trevor Levey
- Oak Ridge Associated Universities, Oak Ridge, TN, United States
- SAS Institute, Cary, NC, United States
| | - Chris T. Evelo
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, Netherlands
- Maastricht Centre for Systems Biology, Maastricht University, Maastricht, Netherlands
| | - Egon L. Willighagen
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, Netherlands
| | | |
Collapse
|
24
|
Kim S, Cheng T, He S, Thiessen PA, Li Q, Gindulyte A, Bolton EE. PubChem Protein, Gene, Pathway, and Taxonomy Data Collections: Bridging Biology and Chemistry through Target-Centric Views of PubChem Data. J Mol Biol 2022; 434:167514. [DOI: 10.1016/j.jmb.2022.167514] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 02/17/2022] [Accepted: 02/22/2022] [Indexed: 12/21/2022]
|
25
|
Ławrynowicz A, Wróblewska A, Adrian WT, Kulczyński B, Gramza-Michałowska A. Food Recipe Ingredient Substitution Ontology Design Pattern. SENSORS 2022; 22:s22031095. [PMID: 35161841 PMCID: PMC8837940 DOI: 10.3390/s22031095] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 01/27/2022] [Accepted: 01/28/2022] [Indexed: 11/29/2022]
Abstract
This paper describes a notion of substitutions in food recipes and their ontology design pattern. We build upon state-of-the-art models for food and process. We also present scenarios and examples for the design pattern. Finally, the pattern is mapped to available and relevant domain ontologies and made publicly available at the ontologydesignpatterns.org portal.
Collapse
Affiliation(s)
- Agnieszka Ławrynowicz
- Center for Artificial Intelligence and Machine Learning (CAMIL), Faculty of Computing and Telecommunications, Poznan University of Technology, 60-965 Poznań, Poland
- Correspondence:
| | - Anna Wróblewska
- Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland;
| | - Weronika T. Adrian
- Applied Computer Science Department, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, AGH University of Science and Technology, al. A. Mickiewicza 30, 30-059 Krakow, Poland;
| | - Bartosz Kulczyński
- Department of Gastronomy Science and Functional Foods, Faculty of Food Science and Nutrition, Poznań University of Life Sciences, 60-637 Poznań, Poland; (B.K.); (A.G.-M.)
| | - Anna Gramza-Michałowska
- Department of Gastronomy Science and Functional Foods, Faculty of Food Science and Nutrition, Poznań University of Life Sciences, 60-637 Poznań, Poland; (B.K.); (A.G.-M.)
| |
Collapse
|
26
|
Dealing with the Ambiguity of Glycan Substructure Search. MOLECULES (BASEL, SWITZERLAND) 2021; 27:molecules27010065. [PMID: 35011294 PMCID: PMC8746581 DOI: 10.3390/molecules27010065] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 12/17/2021] [Accepted: 12/17/2021] [Indexed: 01/15/2023]
Abstract
The level of ambiguity in describing glycan structure has significantly increased with the upsurge of large-scale glycomics and glycoproteomics experiments. Consequently, an ontology-based model appears as an appropriate solution for navigating these data. However, navigation is not sufficient and the model should also enable advanced search and comparison. A new ontology with a tree logical structure is introduced to represent glycan structures irrespective of the precision of molecular details. The model heavily relies on the GlycoCT encoding of glycan structures. Its implementation in the GlySTreeM knowledge base was validated with GlyConnect data and benchmarked with the Glycowork library. GlySTreeM is shown to be fast, consistent, reliable and more flexible than existing solutions for matching parts of or whole glycan structures. The model is also well suited for painless future expansion.
Collapse
|
27
|
Wan L, Song J, He V, Roman J, Whah G, Peng S, Zhang L, He Y. Development of the International Classification of Diseases Ontology (ICDO) and its application for COVID-19 diagnostic data analysis. BMC Bioinformatics 2021; 22:508. [PMID: 34663204 PMCID: PMC8522253 DOI: 10.1186/s12859-021-04402-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 09/24/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The 10th and 9th revisions of the International Statistical Classification of Diseases and Related Health Problems (ICD10 and ICD9) have been adopted worldwide as a well-recognized norm to share codes for diseases, signs and symptoms, abnormal findings, etc. The international Consortium for Clinical Characterization of COVID-19 by EHR (4CE) website stores diagnosis COVID-19 disease data using ICD10 and ICD9 codes. However, the ICD systems are difficult to decode due to their many shortcomings, which can be addressed using ontology. METHODS An ICD ontology (ICDO) was developed to logically and scientifically represent ICD terms and their relations among different ICD terms. ICDO is also aligned with the Basic Formal Ontology (BFO) and reuses terms from existing ontologies. As a use case, the ICD10 and ICD9 diagnosis data from the 4CE website were extracted, mapped to ICDO, and analyzed using ICDO. RESULTS We have developed the ICDO to ontologize the ICD terms and relations. Different from existing disease ontologies, all ICD diseases in ICDO are defined as disease processes to describe their occurrence with other properties. The ICDO decomposes each disease term into different components, including anatomic entities, process profiles, etiological causes, output phenotype, etc. Over 900 ICD terms have been represented in ICDO. Many ICDO terms are presented in both English and Chinese. The ICD10/ICD9-based diagnosis data of over 27,000 COVID-19 patients from 5 countries were extracted from the 4CE. A total of 917 COVID-19-related disease codes, each of which were associated with 1 or more cases in the 4CE dataset, were mapped to ICDO and further analyzed using the ICDO logical annotations. Our study showed that COVID-19 targeted multiple systems and organs such as the lung, heart, and kidney. Different acute and chronic kidney phenotypes were identified. Some kidney diseases appeared to result from other diseases, such as diabetes. Some of the findings could only be easily found using ICDO instead of ICD9/10. CONCLUSIONS ICDO was developed to ontologize ICD10/10 codes and applied to study COVID-19 patient diagnosis data. Our findings showed that ICDO provides a semantic platform for more accurate detection of disease profiles.
Collapse
Affiliation(s)
- Ling Wan
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
- OntoWise, Nanjing, Jiangsu China
| | - Justin Song
- Cranbrook Kingswood Upper School, Bloomfield Hills, MI 48304 USA
| | | | - Jennifer Roman
- College of Literacy, Science, and Arts, University of Michigan, Ann Arbor, MI 48109 USA
| | - Grace Whah
- College of Engineering, University of Michigan, Ann Arbor, MI 48109 USA
| | - Suyuan Peng
- School of Public Health, Peking University, Beijing, China
- National Institute of Health Data Science, Peking University, Beijing, China
| | - Luxia Zhang
- National Institute of Health Data Science, Peking University, Beijing, China
- Advanced Institute of Information Technology, Peking University, Hangzhou, China
- Renal Division, Department of Medicine, Peking University First Hospital, Peking University Institute of Nephrology, Beijing, China
| | - Yongqun He
- University of Michigan Medical School, Ann Arbor, MI 48109 USA
| |
Collapse
|
28
|
Delmas M, Filangi O, Paulhe N, Vinson F, Duperier C, Garrier W, Saunier PE, Pitarch Y, Jourdan F, Giacomoni F, Frainay C. FORUM: Building a Knowledge Graph from public databases and scientific literature to extract associations between chemicals and diseases. Bioinformatics 2021; 37:3896-3904. [PMID: 34478489 PMCID: PMC8570811 DOI: 10.1093/bioinformatics/btab627] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 08/16/2021] [Accepted: 09/01/2021] [Indexed: 11/22/2022] Open
Abstract
Motivation Metabolomics studies aim at reporting a metabolic signature (list of metabolites) related to a particular experimental condition. These signatures are instrumental in the identification of biomarkers or classification of individuals, however their biological and physiological interpretation remains a challenge. To support this task, we introduce FORUM: a Knowledge Graph (KG) providing a semantic representation of relations between chemicals and biomedical concepts, built from a federation of life science databases and scientific literature repositories. Results The use of a Semantic Web framework on biological data allows us to apply ontological-based reasoning to infer new relations between entities. We show that these new relations provide different levels of abstraction and could open the path to new hypotheses. We estimate the statistical relevance of each extracted relation, explicit or inferred, using an enrichment analysis, and instantiate them as new knowledge in the KG to support results interpretation/further inquiries. Availability and implementation A web interface to browse and download the extracted relations, as well as a SPARQL endpoint to directly probe the whole FORUM KG, are available at https://forum-webapp.semantic-metabolomics.fr. The code needed to reproduce the triplestore is available at https://github.com/eMetaboHUB/Forum-DiseasesChem. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M Delmas
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - O Filangi
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, Le Rheu, 35653, France
| | - N Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - F Vinson
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - C Duperier
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - W Garrier
- ISIMA, Campus des Cézeaux, Aubière, 63177, France
| | - P-E Saunier
- ISIMA, Campus des Cézeaux, Aubière, 63177, France
| | - Y Pitarch
- IRIT, Université de Toulouse, Cours Rose Dieng-Kuntz, Toulouse, 31400, France
| | - F Jourdan
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| | - F Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d'Exploration du Métabolisme, MetaboHUB Clermont, Clermont-Ferrand, F-63000, France
| | - C Frainay
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, 31300, France
| |
Collapse
|
29
|
Galgonek J, Vondrášek J. IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminform 2021; 13:38. [PMID: 33980298 PMCID: PMC8117646 DOI: 10.1186/s13321-021-00515-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 04/23/2021] [Indexed: 11/12/2022] Open
Abstract
The Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic.
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic
| |
Collapse
|
30
|
A resource to explore the discovery of rare diseases and their causative genes. Sci Data 2021; 8:124. [PMID: 33947870 PMCID: PMC8096966 DOI: 10.1038/s41597-021-00905-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 03/26/2021] [Indexed: 12/28/2022] Open
Abstract
Here, we describe a dataset with information about monogenic, rare diseases with a known genetic background, supplemented with manually extracted provenance for the disease itself and the discovery of the underlying genetic cause. We assembled a collection of 4166 rare monogenic diseases and linked them to 3163 causative genes, annotated with OMIM and Ensembl identifiers and HGNC symbols. The PubMed identifiers of the scientific publications, which for the first time described the rare diseases, and the publications, which found the genes causing the diseases were added using information from OMIM, PubMed, Wikipedia, whonamedit.com, and Google Scholar. The data are available under CC0 license as spreadsheet and as RDF in a semantic model modified from DisGeNET, and was added to Wikidata. This dataset relies on publicly available data and publications with a PubMed identifier, but by our effort to make the data interoperable and linked, we can now analyse this data. Our analysis revealed the timeline of rare disease and causative gene discovery and links them to developments in methods.
Collapse
|
31
|
Kamdar MR, Musen MA. An empirical meta-analysis of the life sciences linked open data on the web. Sci Data 2021; 8:24. [PMID: 33479214 PMCID: PMC7819992 DOI: 10.1038/s41597-021-00797-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 12/04/2020] [Indexed: 01/29/2023] Open
Abstract
While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 biomedical linked open data sources into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.
Collapse
Affiliation(s)
- Maulik R Kamdar
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA.
- Elsevier Health Markets, Philadelphia, PA, USA.
| | - Mark A Musen
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA
| |
Collapse
|
32
|
Irshad O, Ghani Khan MU. Formalization and Semantic Integration of Heterogeneous Omics Annotations for Exploratory Searches. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200127122818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aim:
To facilitate researchers and practitioners for unveiling the mysterious functional aspects of human cellular system through performing exploratory searching on semantically integrated heterogeneous and geographically dispersed omics annotations.
Background:
Improving health standards of life is one of the motives which continuously instigates researchers and practitioners to strive for uncovering the mysterious aspects of human cellular system. Inferring new knowledge from known facts always requires reasonably large amount of data in well-structured, integrated and unified form. Due to the advent of especially high throughput and sensor technologies, biological data is growing heterogeneously and geographically at astronomical rate. Several data integration systems have been deployed to cope with the issues of data heterogeneity and global dispersion. Systems based on semantic data integration models are more flexible and expandable than syntax-based ones but still lack aspect-based data integration, persistence and querying. Furthermore, these systems do not fully support to warehouse biological entities in the form of semantic associations as naturally possessed by the human cell.
Objective:
To develop aspect-oriented formal data integration model for semantically integrating heterogeneous and geographically dispersed omics annotations for providing exploratory querying on integrated data.
Method:
We propose an aspect-oriented formal data integration model which uses web semantics standards to formally specify its each construct. Proposed model supports aspect-oriented representation of biological entities while addressing the issues of data heterogeneity and global dispersion. It associates and warehouses biological entities in the way they relate with
Result:
To show the significance of proposed model, we developed a data warehouse and information retrieval system based on proposed model compliant multi-layered and multi-modular software architecture. Results show that our model supports well for gathering, associating, integrating, persisting and querying each entity with respect to its all possible aspects within or across the various associated omics layers.
Conclusion:
Formal specifications better facilitate for addressing data integration issues by providing formal means for understanding omics data based on meaning instead of syntax
Collapse
Affiliation(s)
- Omer Irshad
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| | - Muhammad Usman Ghani Khan
- Department of Computer Science & Engineering, Faculty of Electrical Engineering, The University of Engineering and Technology, Lahore,Pakistan
| |
Collapse
|
33
|
van der Velde KJ, van den Hoek S, van Dijk F, Hendriksen D, van Diemen CC, Johansson LF, Abbott KM, Deelen P, Sikkema‐Raddatz B, Swertz MA. A pipeline-friendly software tool for genome diagnostics to prioritize genes by matching patient symptoms to literature. ADVANCED GENETICS (HOBOKEN, N.J.) 2020; 1:e10023. [PMID: 36619248 PMCID: PMC9744518 DOI: 10.1002/ggn2.10023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 02/12/2020] [Accepted: 03/20/2020] [Indexed: 04/11/2023]
Abstract
Despite an explosive growth of next-generation sequencing data, genome diagnostics only provides a molecular diagnosis to a minority of patients. Software tools that prioritize genes based on patient symptoms using known gene-disease associations may complement variant filtering and interpretation to increase chances of success. However, many of these tools cannot be used in practice because they are embedded within variant prioritization algorithms, or exist as remote services that cannot be relied upon or are unacceptable because of legal/ethical barriers. In addition, many tools are not designed for command-line usage, closed-source, abandoned, or unavailable. We present Variant Interpretation using Biomedical literature Evidence (VIBE), a tool to prioritize disease genes based on Human Phenotype Ontology codes. VIBE is a locally installed executable that ensures operational availability and is built upon DisGeNET-RDF, a comprehensive knowledge platform containing gene-disease associations mostly from literature and variant-disease associations mostly from curated source databases. VIBE's command-line interface and output are designed for easy incorporation into bioinformatic pipelines that annotate and prioritize variants for further clinical interpretation. We evaluate VIBE in a benchmark based on 305 patient cases alongside seven other tools. Our results demonstrate that VIBE offers consistent performance with few cases missed, but we also find high complementarity among all tested tools. VIBE is a powerful, free, open source and locally installable solution for prioritizing genes based on patient symptoms. Project source code, documentation, benchmark and executables are available at https://github.com/molgenis/vibe.
Collapse
Affiliation(s)
- K. Joeri van der Velde
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Sander van den Hoek
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Freerk van Dijk
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Prinses Maxima Center for Child OncologyUtrechtThe Netherlands
| | - Dennis Hendriksen
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Cleo C. van Diemen
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Lennart F. Johansson
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Kristin M. Abbott
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Patrick Deelen
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Birgit Sikkema‐Raddatz
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| | - Morris A. Swertz
- Genomics Coordination CenterUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
- Department of GeneticsUniversity of Groningen and University Medical Center GroningenGroningenThe Netherlands
| |
Collapse
|
34
|
Rashid SM, McCusker JP, Pinheiro P, Bax MP, Santos H, Stingone JA, Das AK, McGuinness DL. The Semantic Data Dictionary - An Approach for Describing and Annotating Data. DATA INTELLIGENCE 2020; 2:443-486. [PMID: 33103120 DOI: 10.1162/dint_a_00058] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
It is common practice for data providers to include text descriptions for each column when publishing datasets in the form of data dictionaries. While these documents are useful in helping an end-user properly interpret the meaning of a column in a dataset, existing data dictionaries typically are not machine-readable and do not follow a common specification standard. We introduce the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data, enabling standardization and harmonization across diverse datasets. In this paper, we present our Semantic Data Dictionary work in the context of our work with biomedical data; however, the approach can and has been used in a wide range of domains. The rendition of data in this form helps promote improved discovery, interoperability, reuse, traceability, and reproducibility. We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature. We discuss our approach, present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey dataset, present modeling challenges, and describe the use of this approach in sponsored research, including our work on a large NIH-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics, Learning, and Semantics project. We evaluate this work in comparison with traditional data dictionaries, mapping languages, and data integration tools.
Collapse
Affiliation(s)
| | | | | | - Marcello P Bax
- Universidade Federal de Minas Gerais, Belo Horizonte, MG, 31270-901, BR
| | | | - Jeanette A Stingone
- Columbia University, Mailman School of Public Health, New York, NY, 10032, USA
| | | | | |
Collapse
|
35
|
Brinson LC, Deagen M, Chen W, McCusker J, McGuinness DL, Schadler LS, Palmeri M, Ghumman U, Lin A, Hu B. Polymer Nanocomposite Data: Curation, Frameworks, Access, and Potential for Discovery and Design. ACS Macro Lett 2020; 9:1086-1094. [PMID: 35653211 DOI: 10.1021/acsmacrolett.0c00264] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
With the advent of the materials genome initiative (MGI) in the United States and a similar focus on materials data around the world, a number of materials data resources and associated vocabularies, tools, and repositories have been developed. While the majority of systems focus on slices of computational data with an emphasis on metallic alloys, NanoMine is an open source platform with the goal of curating and storing widely varying experimental data on polymer nanocomposites (polymers doped with nanoparticles) and providing access to characterization and analysis tools with the long-term objective of promoting facile nanocomposite design. Data on over 2500 samples from the literature and individual laboratories has been curated to date into NanoMine, including 230 samples from the papers bound in this virtual issue. This virtual issue represents an experiment of the flexibility of the data repository to capture the unique experimental metadata requirements of many data sets at one time and to challenge the authors to participate in the curation of their research data associated with a given publication. In principle, NanoMine offers a FAIR platform in which data published in papers becomes directly Findable and Accessible via simple search tools, with open metadata standards that are Interoperable with larger materials data registries, and allows easy Reuse of data, e.g. benchmarking against new results. Our hope is that with time, platforms such as this one could capture much of the newly published data on materials and form nodes in an interconnected materials data ecosystem which would allow researchers to robustly archive their data, add to the growing body of readily accessible data, and enable new forms of discovery by application of data analysis and design tools.
Collapse
Affiliation(s)
- L Catherine Brinson
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| | - Michael Deagen
- Department of Mechanical Engineering, University of Vermont, Burlington, Vermont 05405, United States
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
| | - James McCusker
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York 12180, United States
| | - Deborah L McGuinness
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York 12180, United States
| | - Linda S Schadler
- Department of Mechanical Engineering, University of Vermont, Burlington, Vermont 05405, United States
| | - Marc Palmeri
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| | - Umar Ghumman
- Department of Mechanical Engineering, Northwestern University, Evanston, Illinois 60208, United States
| | - Anqi Lin
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| | - Bingyin Hu
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, North Carolina 27708, United States
| |
Collapse
|
36
|
Piñero J, Ramírez-Anguita JM, Saüch-Pitarch J, Ronzano F, Centeno E, Sanz F, Furlong LI. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res 2020; 48:D845-D855. [PMID: 31680165 PMCID: PMC7145631 DOI: 10.1093/nar/gkz1021] [Citation(s) in RCA: 819] [Impact Index Per Article: 204.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 10/14/2019] [Accepted: 10/18/2019] [Indexed: 02/07/2023] Open
Abstract
One of the most pressing challenges in genomic medicine is to understand the role played by genetic variation in health and disease. Thanks to the exploration of genomic variants at large scale, hundreds of thousands of disease-associated loci have been uncovered. However, the identification of variants of clinical relevance is a significant challenge that requires comprehensive interrogation of previous knowledge and linkage to new experimental results. To assist in this complex task, we created DisGeNET (http://www.disgenet.org/), a knowledge management platform integrating and standardizing data about disease associated genes and variants from multiple sources, including the scientific literature. DisGeNET covers the full spectrum of human diseases as well as normal and abnormal traits. The current release covers more than 24 000 diseases and traits, 17 000 genes and 117 000 genomic variants. The latest developments of DisGeNET include new sources of data, novel data attributes and prioritization metrics, a redesigned web interface and recently launched APIs. Thanks to the data standardization, the combination of expert curated information with data automatically mined from the scientific literature, and a suite of tools for accessing its publicly available data, DisGeNET is an interoperable resource supporting a variety of applications in genomic medicine and drug R&D.
Collapse
Affiliation(s)
- Janet Piñero
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
| | - Juan Manuel Ramírez-Anguita
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
| | - Josep Saüch-Pitarch
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
| | - Francesco Ronzano
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
| | - Emilio Centeno
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
| | - Laura I Furlong
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain
| |
Collapse
|
37
|
Vos RA, Katayama T, Mishima H, Kawano S, Kawashima S, Kim JD, Moriya Y, Tokimatsu T, Yamaguchi A, Yamamoto Y, Wu H, Amstutz P, Antezana E, Aoki NP, Arakawa K, Bolleman JT, Bolton E, Bonnal RJP, Bono H, Burger K, Chiba H, Cohen KB, Deutsch EW, Fernández-Breis JT, Fu G, Fujisawa T, Fukushima A, García A, Goto N, Groza T, Hercus C, Hoehndorf R, Itaya K, Juty N, Kawashima T, Kim JH, Kinjo AR, Kotera M, Kozaki K, Kumagai S, Kushida T, Lütteke T, Matsubara M, Miyamoto J, Mohsen A, Mori H, Naito Y, Nakazato T, Nguyen-Xuan J, Nishida K, Nishida N, Nishide H, Ogishima S, Ohta T, Okuda S, Paten B, Perret JL, Prathipati P, Prins P, Queralt-Rosinach N, Shinmachi D, Suzuki S, Tabata T, Takatsuki T, Taylor K, Thompson M, Uchiyama I, Vieira B, Wei CH, Wilkinson M, Yamada I, Yamanaka R, Yoshitake K, Yoshizawa AC, Dumontier M, Kosaki K, Takagi T. BioHackathon 2015: Semantics of data for life sciences and reproducible research. F1000Res 2020; 9:136. [PMID: 32308977 PMCID: PMC7141167 DOI: 10.12688/f1000research.18236.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/05/2020] [Indexed: 01/08/2023] Open
Abstract
We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.
Collapse
Affiliation(s)
- Rutger A. Vos
- Institute of Biology Leiden, Leiden University, Leiden, The Netherlands
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | | | - Hiroyuki Mishima
- Department of Human Genetics, Nagasaki University Graduate School of Biomedical Sciences, Nagasaki, Japan
| | - Shin Kawano
- Database Center for Life Science, Tokyo, Japan
| | | | | | - Yuki Moriya
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Hongyan Wu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | | | - Erick Antezana
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Nobuyuki P. Aoki
- Faculty of Science and Engineering, SOKA University, Tokyo, Japan
| | - Kazuharu Arakawa
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Jerven T. Bolleman
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Lausanne, Switzerland
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Raoul J. P. Bonnal
- Istituto Nazionale Genetica Molecolare, Romeo ed Enrica Invernizzi, Milan, Italy
| | | | - Kees Burger
- Dutch Techcentre for Life Sciences, Utrecht, The Netherlands
| | - Hirokazu Chiba
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Kevin B. Cohen
- Computational Bioscience Program, University of Colorado School of Medicine, Denver, USA
- Université Paris-Saclay, LIMSI, CNRS, Paris, France
| | | | | | - Gang Fu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | | | | | | | - Naohisa Goto
- Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Tudor Groza
- St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Darlinghurst, Australia
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
| | - Colin Hercus
- Novocraft Technologies Sdn. Bhd., Selangor, Malaysia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Kotone Itaya
- Institute for Advanced Biosciences, Keio University, Tokyo, Japan
| | - Nick Juty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Jee-Hyub Kim
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Akira R. Kinjo
- Institute for Protein Research, Osaka University, Osaka, Japan
| | - Masaaki Kotera
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Kouji Kozaki
- The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan
| | | | - Tatsuya Kushida
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
| | - Thomas Lütteke
- Institute of Veterinary Physiology and Biochemistry, Justus-Liebig University Giessen, Giessen, Germany
- Gesellschaft für innovative Personalwirtschaftssysteme mbH (GIP GmbH), Offenbach, Germany
| | | | | | - Attayeb Mohsen
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Hiroshi Mori
- Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Yuki Naito
- Database Center for Life Science, Tokyo, Japan
| | | | | | | | - Naoki Nishida
- Department of Systems Science, Osaka University, Osaka, Japan
| | - Hiroyo Nishide
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Soichi Ogishima
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Tazro Ohta
- Database Center for Life Science, Tokyo, Japan
| | - Shujiro Okuda
- Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, USA
| | | | - Philip Prathipati
- National Institutes of Biomedical Innovation, Health and Nutrition, Osaka, Japan
| | - Pjotr Prins
- University Medical Center Utrecht, Utrecht, The Netherlands
- University of Tennessee Health Science Center, Memphis, USA
| | - Núria Queralt-Rosinach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Shinya Suzuki
- School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
| | - Tsuyosi Tabata
- Graduate School of Pharmaceutical Sciences, Kyoto University, Kyoto, Japan
| | | | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mark Thompson
- Leiden University Medical Center, Leiden, The Netherlands
| | - Ikuo Uchiyama
- National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Japan
| | - Bruno Vieira
- WurmLab, School of Biological & Chemical Sciences, Queen Mary University of London, London, UK
| | - Chih-Hsuan Wei
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, USA
| | - Mark Wilkinson
- Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain
| | | | | | - Kazutoshi Yoshitake
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | | | - Michel Dumontier
- Institute of Data Science, Maastricht University, Maastricht, The Netherlands
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, Japan
| | - Toshihisa Takagi
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo, Japan
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
38
|
Semantic Publication of Agricultural Scientific Literature Using Property Graphs. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10030861] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
During the last decades, there have been significant changes in science that have provoked a big increase in the number of articles published every year. This increment implies a new difficulty for scientists, who have to do an extra effort for selecting literature relevant for their activity. In this work, we present a pipeline for the generation of scientific literature knowledge graphs in the agriculture domain. The pipeline combines Semantic Web and natural language processing technologies, which make data understandable by computer agents, empowering the development of final user applications for literature searches. This workflow consists of (1) RDF generation, including metadata and contents; (2) semantic annotation of the content; and (3) property graph population by adding domain knowledge from ontologies, in addition to the previously generated RDF data describing the articles. This pipeline was applied to a set of 127 agriculture articles, generating a knowledge graph implemented in Neo4j, publicly available on Docker. The potential of our model is illustrated through a series of queries and use cases, which not only include queries about authors or references but also deal with article similarity or clustering based on semantic annotation, which is facilitated by the inclusion of domain ontologies in the graph.
Collapse
|
39
|
Conford B, Almsaeed A, Buehler S, Childers CP, Ficklin SP, Staton ME, Poelchau MF. Tripal EUtils: a Tripal module to increase exchange and reuse of genome assembly metadata. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5709695. [PMID: 31960040 DOI: 10.1093/database/baz143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 11/04/2019] [Accepted: 11/17/2019] [Indexed: 11/13/2022]
Abstract
Data and metadata interoperability between data storage systems is a critical component of the FAIR data principles. Programmatic and consistent means of reconciling metadata models between databases promote data exchange and thus increases its access to the scientific community. This process requires (i) metadata mapping between the models and (ii) software to perform the mapping. Here, we describe our efforts to map metadata associated with genome assemblies between the National Center for Biotechnology Information (NCBI) data resources and the Chado biological database schema. We present mappings for multiple NCBI data structures and introduce a Tripal software module, Tripal EUtils, to pull metadata from NCBI into a Tripal/Chado database. We discuss potential mapping challenges and solutions and provide suggestions for future development to further increase interoperability between these platforms. Database URL: https://github.com/NAL-i5K/tripal_eutils.
Collapse
Affiliation(s)
- B Conford
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - A Almsaeed
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - S Buehler
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - C P Childers
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - S P Ficklin
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - M E Staton
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| | - M F Poelchau
- United States Department of Agriculture, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Avenue, Beltsville, MD 20705, USA
| |
Collapse
|
40
|
Kafkas Ş, Abdelhakim M, Hashish Y, Kulmanov M, Abdellatif M, Schofield PN, Hoehndorf R. PathoPhenoDB, linking human pathogens to their phenotypes in support of infectious disease research. Sci Data 2019; 6:79. [PMID: 31160594 PMCID: PMC6546783 DOI: 10.1038/s41597-019-0090-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 05/07/2019] [Indexed: 12/11/2022] Open
Abstract
Understanding the relationship between the pathophysiology of infectious disease, the biology of the causative agent and the development of therapeutic and diagnostic approaches is dependent on the synthesis of a wide range of types of information. Provision of a comprehensive and integrated disease phenotype knowledgebase has the potential to provide novel and orthogonal sources of information for the understanding of infectious agent pathogenesis, and support for research on disease mechanisms. We have developed PathoPhenoDB, a database containing pathogen-to-phenotype associations. PathoPhenoDB relies on manual curation of pathogen-disease relations, on ontology-based text mining as well as manual curation to associate host disease phenotypes with infectious agents. Using Semantic Web technologies, PathoPhenoDB also links to knowledge about drug resistance mechanisms and drugs used in the treatment of infectious diseases. PathoPhenoDB is accessible at http://patho.phenomebrowser.net/ , and the data are freely available through a public SPARQL endpoint.
Collapse
Affiliation(s)
- Şenay Kafkas
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Marwa Abdelhakim
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Yasmeen Hashish
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Maxat Kulmanov
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Marwa Abdellatif
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Downing Street, Cambridge, CB2 3EG, United Kingdom
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering with Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
41
|
Sood AJ, Viner C, Hoffman MM. DNAmod: the DNA modification database. J Cheminform 2019; 11:30. [PMID: 31016417 PMCID: PMC6478773 DOI: 10.1186/s13321-019-0349-4] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 03/25/2019] [Indexed: 11/10/2022] Open
Abstract
Covalent DNA modifications, such as 5-methylcytosine (5mC), are increasingly the focus of numerous research programs. In eukaryotes, both 5mC and 5-hydroxymethylcytosine (5hmC) are now recognized as stable epigenetic marks, with diverse functions. Bacteria, archaea, and viruses contain various other modified DNA nucleobases. Numerous databases describe RNA and histone modifications, but no database specifically catalogues DNA modifications, despite their broad importance in epigenetic regulation. To address this need, we have developed DNAmod: the DNA modification database. DNAmod is an open-source database ( https://dnamod.hoffmanlab.org ) that catalogues DNA modifications and provides a single source to learn about their properties. DNAmod provides a web interface to easily browse and search through these modifications. The database annotates the chemical properties and structures of all curated modified DNA bases, and a much larger list of candidate chemical entities. DNAmod includes manual annotations of available sequencing methods, descriptions of their occurrence in nature, and provides existing and suggested nomenclature. DNAmod enables researchers to rapidly review previous work, select mapping techniques, and track recent developments concerning modified bases of interest.
Collapse
Affiliation(s)
- Ankur Jai Sood
- Department of Medical Biophysics, University of Toronto, Princess Margaret Cancer Research Tower 15-701, 101 College Street, Toronto, ON M5G 1L7 Canada
- Princess Margaret Cancer Centre, Princess Margaret Cancer Research Tower 11-311, 101 College Street, Toronto, ON M5G 1L7 Canada
| | - Coby Viner
- Princess Margaret Cancer Centre, Princess Margaret Cancer Research Tower 11-311, 101 College Street, Toronto, ON M5G 1L7 Canada
- Department of Computer Science, University of Toronto, Sandford Fleming Building 3302, 10 King’s College Road, Toronto, ON M5S 3G4 Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto, Princess Margaret Cancer Research Tower 15-701, 101 College Street, Toronto, ON M5G 1L7 Canada
- Princess Margaret Cancer Centre, Princess Margaret Cancer Research Tower 11-311, 101 College Street, Toronto, ON M5G 1L7 Canada
- Department of Computer Science, University of Toronto, Sandford Fleming Building 3302, 10 King’s College Road, Toronto, ON M5S 3G4 Canada
- Vector Institute, MaRS Centre, West Tower, Suite 710, 661 University Avenue, Toronto, ON M5G 1M1 Canada
| |
Collapse
|
42
|
Abstract
The goal of the NIMH RDoC initiative is to establish a biological basis for mental illness that includes linking cognition to molecular biology. A key challenge lies in how to represent such large, complex, and multi-scale knowledge in a manner that can support computational analysis, including query answering. Formal ontologies, such as the Semanticscience Integrated Ontology (SIO), offer a scaffold in which complex domain knowledge such as neurological and cognitive functions can be represented and linked to knowledge of molecular biology. In this article, we explore the use of SIO to represent concepts in molecular biology and in cognition. We extend SIO to traditional cognitive topics by illustrating axioms for both an information-processing and a neuroscience perspective on reading. We next discuss the NIMH RDoC taxonomy and include SIO axioms for the units-of-analysis and functions-of-behavior dimensions. An example demonstrates its use of deductive reasoning to establish causal relations across RDoC dimensions. From a broader perspective this article demonstrates how informatics can assist in integrating work in clinical psychology, cognitive psychology, cognitive neuroscience, computer science, molecular biology, and philosophy.
Collapse
Affiliation(s)
- Stephen K Reed
- Psychology and CRMSE, San Diego State University, San Diego, California, USA
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, The Netherlands
| |
Collapse
|
43
|
Katayama T, Kawashima S, Okamoto S, Moriya Y, Chiba H, Naito Y, Fujisawa T, Mori H, Takagi T. TogoGenome/TogoStanza: modularized Semantic Web genome database. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5277251. [PMID: 30624651 PMCID: PMC6323299 DOI: 10.1093/database/bay132] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2018] [Accepted: 11/26/2018] [Indexed: 11/12/2022]
Abstract
TogoGenome is a genome database that is purely based on the Semantic Web technology, which enables the integration of heterogeneous data and flexible semantic searches.
All the information is stored as Resource Description Framework (RDF) data, and the reporting web pages are generated on the fly using SPARQL Protocol and RDF Query Language (SPARQL) queries. TogoGenome provides a semantic-faceted search system by gene functional annotation, taxonomy, phenotypes and environment based on the relevant ontologies. TogoGenome also serves as an interface to conduct semantic comparative genomics by which a user can observe pan-organism or organism-specific genes based on the functional aspect of gene annotations and the combinations of organisms from different taxa. The TogoGenome database exhibits a modularized structure, and each module in the report pages is separately served as TogoStanza, which is a generic framework for rendering an information block as IFRAME/Web Components, which can, unlike several other monolithic databases, also be reused to construct other databases. TogoGenome and TogoStanza have been under development since 2012 and are freely available along with their source codes on the GitHub repositories at https://github.com/togogenome/ and https://github.com/togostanza/, respectively, under the MIT license.
Collapse
Affiliation(s)
- Toshiaki Katayama
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | - Shuichi Kawashima
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | - Shinobu Okamoto
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | - Yuki Moriya
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | - Hirokazu Chiba
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | - Yuki Naito
- Database Center for Life Science, Joint Support-Center for Data Science Research, Research Organization of Information and Systems, Wakashiba, Kashiwa-shi, Chiba, Japan
| | | | - Hiroshi Mori
- National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Toshihisa Takagi
- National Institute of Genetics, Mishima, Shizuoka, Japan.,Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Yayoi, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
44
|
Jonquet C, Toulet A, Dutta B, Emonet V. Harnessing the Power of Unified Metadata in an Ontology Repository: The Case of AgroPortal. JOURNAL ON DATA SEMANTICS 2018. [DOI: 10.1007/s13740-018-0091-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
45
|
Thompson P, Daikou S, Ueno K, Batista-Navarro R, Tsujii J, Ananiadou S. Annotation and detection of drug effects in text for pharmacovigilance. J Cheminform 2018; 10:37. [PMID: 30105604 PMCID: PMC6089860 DOI: 10.1186/s13321-018-0290-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 07/20/2018] [Indexed: 02/02/2023] Open
Abstract
Pharmacovigilance (PV) databases record the benefits and risks of different drugs, as a means to ensure their safe and effective use. Creating and maintaining such resources can be complex, since a particular medication may have divergent effects in different individuals, due to specific patient characteristics and/or interactions with other drugs being administered. Textual information from various sources can provide important evidence to curators of PV databases about the usage and effects of drug targets in different medical subjects. However, the efficient identification of relevant evidence can be challenging, due to the increasing volume of textual data. Text mining (TM) techniques can support curators by automatically detecting complex information, such as interactions between drugs, diseases and adverse effects. This semantic information supports the quick identification of documents containing information of interest (e.g., the different types of patients in which a given adverse drug reaction has been observed to occur). TM tools are typically adapted to different domains by applying machine learning methods to corpora that are manually labelled by domain experts using annotation guidelines to ensure consistency. We present a semantically annotated corpus of 597 MEDLINE abstracts, PHAEDRA, encoding rich information on drug effects and their interactions, whose quality is assured through the use of detailed annotation guidelines and the demonstration of high levels of inter-annotator agreement (e.g., 92.6% F-Score for identifying named entities and 78.4% F-Score for identifying complex events, when relaxed matching criteria are applied). To our knowledge, the corpus is unique in the domain of PV, according to the level of detail of its annotations. To illustrate the utility of the corpus, we have trained TM tools based on its rich labels to recognise drug effects in text automatically. The corpus and annotation guidelines are available at: http://www.nactem.ac.uk/PHAEDRA/ .
Collapse
Affiliation(s)
- Paul Thompson
- National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| | - Sophia Daikou
- National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| | - Kenju Ueno
- Artificial Intelligence Research Center, National Research and Development Agency (AIST), Tokyo Waterfront 2-3-2 Aomi, Koto-ku, Tokyo, 135-0064 Japan
| | - Riza Batista-Navarro
- National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| | - Jun’ichi Tsujii
- National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
- Artificial Intelligence Research Center, National Research and Development Agency (AIST), Tokyo Waterfront 2-3-2 Aomi, Koto-ku, Tokyo, 135-0064 Japan
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN UK
| |
Collapse
|
46
|
Brandizi M, Singh A, Rawlings C, Hassani-Pak K. Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach. J Integr Bioinform 2018; 15:/j/jib.ahead-of-print/jib-2018-0023/jib-2018-0023.xml. [PMID: 30085931 PMCID: PMC6340125 DOI: 10.1515/jib-2018-0023] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 06/07/2018] [Indexed: 01/01/2023] Open
Abstract
The speed and accuracy of new scientific discoveries – be it by humans or artificial intelligence – depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles).
Collapse
Affiliation(s)
- Marco Brandizi
- Rothamsted Research, Computational and Analytical Sciences Department, Harpenden, AL5 2JQ, UK
| | - Ajit Singh
- Rothamsted Research, Computational and Analytical Sciences Department, Harpenden, AL5 2JQ, UK
| | - Christopher Rawlings
- Rothamsted Research, Computational and Analytical Sciences Department, Harpenden, AL5 2JQ, UK
| | - Keywan Hassani-Pak
- Rothamsted Research, Computational and Analytical Sciences Department, Harpenden, AL5 2JQ, UK
| |
Collapse
|
47
|
Hu W, Qiu H, Huang J, Dumontier M. BioSearch: a semantic search engine for Bio2RDF. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:4079799. [PMID: 29220451 PMCID: PMC5569678 DOI: 10.1093/database/bax059] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Accepted: 07/10/2017] [Indexed: 12/14/2022]
Abstract
Biomedical data are growing at an incredible pace and require substantial expertise to organize data in a manner that makes them easily findable, accessible, interoperable and reusable. Massive effort has been devoted to using Semantic Web standards and technologies to create a network of Linked Data for the life sciences, among others. However, while these data are accessible through programmatic means, effective user interfaces for non-experts to SPARQL endpoints are few and far between. Contributing to user frustrations is that data are not necessarily described using common vocabularies, thereby making it difficult to aggregate results, especially when distributed across multiple SPARQL endpoints. We propose BioSearch — a semantic search engine that uses ontologies to enhance federated query construction and organize search results. BioSearch also features a simplified query interface that allows users to optionally filter their keywords according to classes, properties and datasets. User evaluation demonstrated that BioSearch is more effective and usable than two state of the art search and browsing solutions. Database URL:http://ws.nju.edu.cn/biosearch/
Collapse
Affiliation(s)
- Wei Hu
- State Key Laboratory for Novel Software Technology, Nanjing University, China.,Institute of Data Science, Maastricht University, The Netherlands
| | - Honglei Qiu
- State Key Laboratory for Novel Software Technology, Nanjing University, China
| | - Jiacheng Huang
- State Key Laboratory for Novel Software Technology, Nanjing University, China
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, The Netherlands
| |
Collapse
|
48
|
|
49
|
Tang A, Tam R, Cadrin-Chênevert A, Guest W, Chong J, Barfett J, Chepelev L, Cairns R, Mitchell JR, Cicero MD, Poudrette MG, Jaremko JL, Reinhold C, Gallix B, Gray B, Geis R, O'Connell T, Babyn P, Koff D, Ferguson D, Derkatch S, Bilbily A, Shabana W. Canadian Association of Radiologists White Paper on Artificial Intelligence in Radiology. Can Assoc Radiol J 2018; 69:120-135. [DOI: 10.1016/j.carj.2018.02.002] [Citation(s) in RCA: 238] [Impact Index Per Article: 39.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 02/13/2018] [Indexed: 02/07/2023] Open
Abstract
Artificial intelligence (AI) is rapidly moving from an experimental phase to an implementation phase in many fields, including medicine. The combination of improved availability of large datasets, increasing computing power, and advances in learning algorithms has created major performance breakthroughs in the development of AI applications. In the last 5 years, AI techniques known as deep learning have delivered rapidly improving performance in image recognition, caption generation, and speech recognition. Radiology, in particular, is a prime candidate for early adoption of these techniques. It is anticipated that the implementation of AI in radiology over the next decade will significantly improve the quality, value, and depth of radiology's contribution to patient care and population health, and will revolutionize radiologists' workflows. The Canadian Association of Radiologists (CAR) is the national voice of radiology committed to promoting the highest standards in patient-centered imaging, lifelong learning, and research. The CAR has created an AI working group with the mandate to discuss and deliberate on practice, policy, and patient care issues related to the introduction and implementation of AI in imaging. This white paper provides recommendations for the CAR derived from deliberations between members of the AI working group. This white paper on AI in radiology will inform CAR members and policymakers on key terminology, educational needs of members, research and development, partnerships, potential clinical applications, implementation, structure and governance, role of radiologists, and potential impact of AI on radiology in Canada.
Collapse
Affiliation(s)
- An Tang
- Department of Radiology, Université de Montréal, Montréal, Québec, Canada
- Centre de recherche du Centre hospitalier de l'Université de Montréal, Montréal, Québec, Canada
| | - Roger Tam
- Department of Radiology, University of British Columbia, Vancouver, British Columbia, Canada
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Will Guest
- Department of Radiology, University of British Columbia, Vancouver, British Columbia, Canada
| | - Jaron Chong
- Department of Radiology, McGill University Health Center, Montréal, Québec, Canada
| | - Joseph Barfett
- Department of Medical Imaging, St. Michael's Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Leonid Chepelev
- Department of Radiology, University of Ottawa, Ottawa, Ontario, Canada
| | - Robyn Cairns
- Department of Radiology, British Columbia's Children's Hospital, University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Mark D. Cicero
- Department of Medical Imaging, St. Michael's Hospital, University of Toronto, Toronto, Ontario, Canada
| | | | - Jacob L. Jaremko
- Department of Radiology and Diagnostic Imaging, University of Alberta, Edmonton, Alberta, Canada
| | - Caroline Reinhold
- Department of Radiology, McGill University Health Center, Montréal, Québec, Canada
| | - Benoit Gallix
- Department of Radiology, McGill University Health Center, Montréal, Québec, Canada
| | - Bruce Gray
- Department of Medical Imaging, St. Michael's Hospital, University of Toronto, Toronto, Ontario, Canada
| | - Raym Geis
- Department of Radiology, National Jewish Health, Denver, Colorado, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Opinion: Why we need a centralized repository for isotopic data. Proc Natl Acad Sci U S A 2018; 114:2997-3001. [PMID: 28325883 DOI: 10.1073/pnas.1701742114] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|