1
|
Kellmann AJ, Postema M, de Keijser J, Svetachov P, Wilson RC, van Enckevort EJ, Swertz MA. Visualization and exploration of linked data using virtual reality. Database (Oxford) 2024; 2024:baae008. [PMID: 38554132 PMCID: PMC11184448 DOI: 10.1093/database/baae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 12/18/2023] [Accepted: 01/25/2024] [Indexed: 04/01/2024]
Abstract
In this report, we analyse the use of virtual reality (VR) as a method to navigate and explore complex knowledge graphs. Over the past few decades, linked data technologies [Resource Description Framework (RDF) and Web Ontology Language (OWL)] have shown to be valuable to encode such graphs and many tools have emerged to interactively visualize RDF. However, as knowledge graphs get larger, most of these tools struggle with the limitations of 2D screens or 3D projections. Therefore, in this paper, we evaluate the use of VR to visually explore SPARQL Protocol and RDF Query Language (SPARQL) (construct) queries, including a series of tutorial videos that demonstrate the power of VR (see Graph2VR tutorial playlist: https://www.youtube.com/playlist?list=PLRQCsKSUyhNIdUzBNRTmE-_JmuiOEZbdH). We first review existing methods for Linked Data visualization and then report the creation of a prototype, Graph2VR. Finally, we report a first evaluation of the use of VR for exploring linked data graphs. Our results show that most participants enjoyed testing Graph2VR and found it to be a useful tool for graph exploration and data discovery. The usability study also provides valuable insights for potential future improvements to Linked Data visualization in VR.
Collapse
Affiliation(s)
- Alexander J Kellmann
- Department of Genetics, University of Groningen, Antonius Deusinglaan 1, Groningen, Groningen 9713 AV, The Netherlands
- Department of Genetics, University Medical Center Groningen, Antonius Deusinglaan 1, Groningen, Groningen 9713 AV, The Netherlands
| | - Max Postema
- Department of Genetics, University Medical Center Groningen, Antonius Deusinglaan 1, Groningen, Groningen 9713 AV, The Netherlands
| | - Joris de Keijser
- Department of Genetics, University Medical Center Groningen, Antonius Deusinglaan 1, Groningen, Groningen 9713 AV, The Netherlands
| | - Pjotr Svetachov
- Center of information technology, University of Groningen, Nettelbosje 1, Groningen, Groningen 9747 AJ, The Netherlands
| | - Rebecca C Wilson
- Public Health, Policy & Systems, University of Liverpool, Block B, 1st Floor, Waterhouse Building, 1-5 Dover Street, Liverpool L69 3GL, United Kingdom
| | - Esther J van Enckevort
- Department of Genetics, University of Groningen, Antonius Deusinglaan 1, Groningen, Groningen 9713 AV, The Netherlands
- Department of Genetics, University Medical Center Groningen, Antonius Deusinglaan 1, Groningen, Groningen 9713 AV, The Netherlands
| | - Morris A Swertz
- Department of Genetics, University of Groningen, Antonius Deusinglaan 1, Groningen, Groningen 9713 AV, The Netherlands
- Department of Genetics, University Medical Center Groningen, Antonius Deusinglaan 1, Groningen, Groningen 9713 AV, The Netherlands
| |
Collapse
|
2
|
Slenter DN, Hemel IMGM, Evelo CT, Bierau J, Willighagen EL, Steinbusch LKM. Extending inherited metabolic disorder diagnostics with biomarker interaction visualizations. Orphanet J Rare Dis 2023; 18:95. [PMID: 37101200 PMCID: PMC10131334 DOI: 10.1186/s13023-023-02683-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 04/02/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND Inherited Metabolic Disorders (IMDs) are rare diseases where one impaired protein leads to a cascade of changes in the adjacent chemical conversions. IMDs often present with non-specific symptoms, a lack of a clear genotype-phenotype correlation, and de novo mutations, complicating diagnosis. Furthermore, products of one metabolic conversion can be the substrate of another pathway obscuring biomarker identification and causing overlapping biomarkers for different disorders. Visualization of the connections between metabolic biomarkers and the enzymes involved might aid in the diagnostic process. The goal of this study was to provide a proof-of-concept framework for integrating knowledge of metabolic interactions with real-life patient data before scaling up this approach. This framework was tested on two groups of well-studied and related metabolic pathways (the urea cycle and pyrimidine de-novo synthesis). The lessons learned from our approach will help to scale up the framework and support the diagnosis of other less-understood IMDs. METHODS Our framework integrates literature and expert knowledge into machine-readable pathway models, including relevant urine biomarkers and their interactions. The clinical data of 16 previously diagnosed patients with various pyrimidine and urea cycle disorders were visualized on the top 3 relevant pathways. Two expert laboratory scientists evaluated the resulting visualizations to derive a diagnosis. RESULTS The proof-of-concept platform resulted in varying numbers of relevant biomarkers (five to 48), pathways, and pathway interactions for each patient. The two experts reached the same conclusions for all samples with our proposed framework as with the current metabolic diagnostic pipeline. For nine patient samples, the diagnosis was made without knowledge about clinical symptoms or sex. For the remaining seven cases, four interpretations pointed in the direction of a subset of disorders, while three cases were found to be undiagnosable with the available data. Diagnosing these patients would require additional testing besides biochemical analysis. CONCLUSION The presented framework shows how metabolic interaction knowledge can be integrated with clinical data in one visualization, which can be relevant for future analysis of difficult patient cases and untargeted metabolomics data. Several challenges were identified during the development of this framework, which should be resolved before this approach can be scaled up and implemented to support the diagnosis of other (less understood) IMDs. The framework could be extended with other OMICS data (e.g. genomics, transcriptomics), and phenotypic data, as well as linked to other knowledge captured as Linked Open Data.
Collapse
Affiliation(s)
- Denise N Slenter
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands.
| | - Irene M G M Hemel
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Jörgen Bierau
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
- Department of Clinical Genetics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Laura K M Steinbusch
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands
| |
Collapse
|
3
|
Galgonek J, Vondrášek J. IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminform 2021; 13:38. [PMID: 33980298 PMCID: PMC8117646 DOI: 10.1186/s13321-021-00515-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 04/23/2021] [Indexed: 11/12/2022] Open
Abstract
The Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic.
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic
| |
Collapse
|
4
|
Diverse Taxonomies for Diverse Chemistries: Enhanced Representation of Natural Product Metabolism in UniProtKB. Metabolites 2021; 11:metabo11010048. [PMID: 33445429 PMCID: PMC7827101 DOI: 10.3390/metabo11010048] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 01/05/2021] [Accepted: 01/07/2021] [Indexed: 01/28/2023] Open
Abstract
The UniProt Knowledgebase UniProtKB is a comprehensive, high-quality, and freely accessible resource of protein sequences and functional annotation that covers genomes and proteomes from tens of thousands of taxa, including a broad range of plants and microorganisms producing natural products of medical, nutritional, and agronomical interest. Here we describe work that enhances the utility of UniProtKB as a support for both the study of natural products and for their discovery. The foundation of this work is an improved representation of natural product metabolism in UniProtKB using Rhea, an expert-curated knowledgebase of biochemical reactions, that is built on the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Knowledge of natural products and precursors is captured in ChEBI, enzyme-catalyzed reactions in Rhea, and enzymes in UniProtKB/Swiss-Prot, thereby linking chemical structure data directly to protein knowledge. We provide a practical demonstration of how users can search UniProtKB for protein knowledge relevant to natural products through interactive or programmatic queries using metabolite names and synonyms, chemical identifiers, chemical classes, and chemical structures and show how to federate UniProtKB with other data and knowledge resources and tools using semantic web technologies such as RDF and SPARQL. All UniProtKB data are freely available for download in a broad range of formats for users to further mine or exploit as an annotation source, to enrich other natural product datasets and databases.
Collapse
|
5
|
Alexandrov T. Spatial Metabolomics and Imaging Mass Spectrometry in the Age of Artificial Intelligence. Annu Rev Biomed Data Sci 2020; 3:61-87. [PMID: 34056560 DOI: 10.1146/annurev-biodatasci-011420-031537] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Spatial metabolomics is an emerging field of omics research that has enabled localizing metabolites, lipids, and drugs in tissue sections, a feat considered impossible just two decades ago. Spatial metabolomics and its enabling technology-imaging mass spectrometry-generate big hyper-spectral imaging data that have motivated the development of tailored computational methods at the intersection of computational metabolomics and image analysis. Experimental and computational developments have recently opened doors to applications of spatial metabolomics in life sciences and biomedicine. At the same time, these advances have coincided with a rapid evolution in machine learning, deep learning, and artificial intelligence, which are transforming our everyday life and promise to revolutionize biology and healthcare. Here, we introduce spatial metabolomics through the eyes of a computational scientist, review the outstanding challenges, provide a look into the future, and discuss opportunities granted by the ongoing convergence of human and artificial intelligence.
Collapse
Affiliation(s)
- Theodore Alexandrov
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany.,Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, La Jolla, California 92093, USA
| |
Collapse
|
6
|
Kratochvíl M, Vondrášek J, Galgonek J. Interoperable chemical structure search service. J Cheminform 2019; 11:45. [PMID: 31254167 PMCID: PMC6599361 DOI: 10.1186/s13321-019-0367-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 06/21/2019] [Indexed: 11/12/2022] Open
Abstract
Motivation The existing connections between large databases of chemicals, proteins, metabolites and assays offer valuable resources for research in fields ranging from drug design to metabolomics. Transparent search across multiple databases provides a way to efficiently utilize these resources. To simplify such searches, many databases have adopted semantic technologies that allow interoperable querying of the datasets using SPARQL query language. However, the interoperable interfaces of the chemical databases still lack the functionality of structure-driven chemical search, which is a fundamental method of data discovery in the chemical search space. Results We present a SPARQL service that augments existing semantic services by making interoperable substructure and similarity searches in small-molecule databases possible. The service thus offers new possibilities for querying interoperable databases, and simplifies writing of heterogeneous queries that include chemical-structure search terms. Availability The service is freely available and accessible using a standard SPARQL endpoint interface. The service documentation and user-oriented demonstration interfaces that allow quick explorative querying of datasets are available at https://idsm.elixir-czech.cz.
Collapse
Affiliation(s)
- Miroslav Kratochvíl
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic.,Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Malostranské náměstí 25, 118 00, Prague 1, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic
| | - Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic.
| |
Collapse
|