1
|
Galgonek J, Vondrášek J. The IDSM mass spectrometry extension: searching mass spectra using SPARQL. Bioinformatics 2024; 40:btae174. [PMID: 38561173 PMCID: PMC11034985 DOI: 10.1093/bioinformatics/btae174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 02/24/2024] [Accepted: 03/28/2024] [Indexed: 04/04/2024] Open
Abstract
SUMMARY The Integrated Database of Small Molecules (IDSM) integrates data from small-molecule datasets, making them accessible through the SPARQL query language. Its unique feature is the ability to search for compounds through SPARQL based on their molecular structure. We extended IDSM to enable mass spectra databases to be integrated and searched for based on mass spectrum similarity. As sources of mass spectra, we employed the MassBank of North America database and the In Silico Spectral Database of natural products. AVAILABILITY AND IMPLEMENTATION The extension is an integral part of IDSM, which is available at https://idsm.elixir-czech.cz. The manual and usage examples are available at https://idsm.elixir-czech.cz/docs/ms. The source codes of all IDSM parts are available under open-source licences at https://github.com/idsm-src.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, Prague 160 00, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo náměstí 2, Prague 160 00, Czech Republic
| |
Collapse
|
2
|
Altenhoff A, Bairoch A, Bansal P, Baratin D, Bastian F, Bolleman* J, Bridge A, Burdet F, Crameri K, Dauvillier J, Dessimoz C, Gehant S, Glover N, Gnodtke K, Hayes C, Ibberson M, Kriventseva E, Kuznetsov D, Frédérique L, Mehl F, Mendes de Farias* T, Michel PA, Moretti S, Morgat A, Österle S, Pagni M, Redaschi N, Robinson-Rechavi M, Samarasinghe K, Sima AC, Szklarczyk D, Topalov O, Touré V, Unni D, von Mering C, Wollbrett J, Zahn-Zabal* M, Zdobnov E. The SIB Swiss Institute of Bioinformatics Semantic Web of data. Nucleic Acids Res 2024; 52:D44-D51. [PMID: 37878411 PMCID: PMC10767860 DOI: 10.1093/nar/gkad902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/02/2023] [Accepted: 10/05/2023] [Indexed: 10/27/2023] Open
Abstract
The SIB Swiss Institute of Bioinformatics (https://www.sib.swiss/) is a federation of bioinformatics research and service groups. The international life science community in academia and industry has been accessing the freely available databases provided by SIB since its inception in 1998. In this paper we present the 11 databases which currently offer semantically enriched data in accordance with the FAIR principles (Findable, Accessible, Interoperable, Reusable), as well as the Swiss Personalized Health Network initiative (SPHN) which also employs this enrichment. The semantic enrichment facilitates the manipulation of large data sets from public databases and private data sets. Examples are provided to illustrate that the data from the SIB databases can not only be queried using precise criteria individually, but also across multiple databases, including a variety of non-SIB databases. Data manipulation, be it exploration, extraction, annotation, combination, and publication, is possible using the SPARQL query language. Providing documentation, tutorials and sample queries makes it easier to navigate this web of semantic data. Through this paper, the reader will discover how the existing SIB knowledge graphs can be leveraged to tackle the complex biological or clinical questions that are being addressed today.
Collapse
|
3
|
Martens M, Evelo CT, Willighagen EL. Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. APPLIED IN VITRO TOXICOLOGY 2022; 8:2-13. [PMID: 35388368 DOI: 10.26434/chemrxiv.13524191] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
INTRODUCTION The AOP-Wiki is the main platform for the development and storage of adverse outcome pathways (AOPs). These AOPs describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content, and to link it with chemical and biological resources. Together, this allows more detailed exploration, which can be automated. MATERIALS AND METHODS We converted the complete AOP-Wiki content into resource description framework (RDF) triples. We used >20 ontologies for the semantic annotation of property-object relations, including the Chemical Information Ontology, Dublin Core, and the AOP Ontology. RESULTS The resulting RDF contains >122,000 triples describing 158 unique properties of >15,000 unique subjects. Furthermore, >3500 link-outs were added to 12 chemical databases, and >7500 link-outs to 4 gene and protein databases. The AOP-Wiki RDF has been made available at https://aopwiki.rdf.bigcat-bioinformatics.org. DISCUSSION SPARQL queries can be used to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. CONCLUSION Overall, the AOP-Wiki RDF allows new ways to explore the rapidly growing AOP knowledge and makes the integration of this database in automated workflows possible, making the AOP-Wiki more FAIR.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| | - Chris T Evelo
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Egon L Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
4
|
Martens M, Evelo CT, Willighagen EL. Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. APPLIED IN VITRO TOXICOLOGY 2022; 8:2-13. [PMID: 35388368 PMCID: PMC8978481 DOI: 10.1089/aivt.2021.0010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Introduction: The AOP-Wiki is the main platform for the development and storage of adverse outcome pathways (AOPs). These AOPs describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content, and to link it with chemical and biological resources. Together, this allows more detailed exploration, which can be automated. Materials and Methods: We converted the complete AOP-Wiki content into resource description framework (RDF) triples. We used >20 ontologies for the semantic annotation of property–object relations, including the Chemical Information Ontology, Dublin Core, and the AOP Ontology. Results: The resulting RDF contains >122,000 triples describing 158 unique properties of >15,000 unique subjects. Furthermore, >3500 link-outs were added to 12 chemical databases, and >7500 link-outs to 4 gene and protein databases. The AOP-Wiki RDF has been made available at https://aopwiki.rdf.bigcat-bioinformatics.org Discussion: SPARQL queries can be used to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. Conclusion: Overall, the AOP-Wiki RDF allows new ways to explore the rapidly growing AOP knowledge and makes the integration of this database in automated workflows possible, making the AOP-Wiki more FAIR.
Collapse
Affiliation(s)
- Marvin Martens
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Egon L. Willighagen
- Department of Bioinformatics—BiGCaT, NUTRIM, and Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
5
|
Altenhoff AM, Train CM, Gilbert KJ, Mediratta I, Mendes de Farias T, Moi D, Nevers Y, Radoykova HS, Rossier V, Warwick Vesztrocy A, Glover NM, Dessimoz C. OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more. Nucleic Acids Res 2021; 49:D373-D379. [PMID: 33174605 PMCID: PMC7779010 DOI: 10.1093/nar/gkaa1007] [Citation(s) in RCA: 99] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/10/2020] [Accepted: 10/14/2020] [Indexed: 01/11/2023] Open
Abstract
OMA is an established resource to elucidate evolutionary relationships among genes from currently 2326 genomes covering all domains of life. OMA provides pairwise and groupwise orthologs, functional annotations, local and global gene order conservation (synteny) information, among many other functions. This update paper describes the reorganisation of the database into gene-, group- and genome-centric pages. Other new and improved features are detailed, such as reporting of the evolutionarily best conserved isoforms of alternatively spliced genes, the inferred local order of ancestral genes, phylogenetic profiling, better cross-references, fast genome mapping, semantic data sharing via RDF, as well as a special coronavirus OMA with 119 viruses from the Nidovirales order, including SARS-CoV-2, the agent of the COVID-19 pandemic. We conclude with improvements to the documentation of the resource through primers, tutorials and short videos. OMA is accessible at https://omabrowser.org.
Collapse
Affiliation(s)
- Adrian M Altenhoff
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- ETH Zurich, Computer Science, Universitätstr. 6, 8092 Zurich, Switzerland
| | - Clément-Marie Train
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Kimberly J Gilbert
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Ishita Mediratta
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Department of Computer Science and Information Systems, BITS Pilani K.K. Birla Goa Campus, India
| | | | - David Moi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Yannis Nevers
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Hale-Seda Radoykova
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower St, London WC1E 6BT, United Kingdom
- Department of Computer Science, University College London, Gower St, London WC1E 6BT, United Kingdom
| | - Victor Rossier
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Alex Warwick Vesztrocy
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Natasha M Glover
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
- Centre for Life's Origins and Evolution, Department of Genetics, Evolution and Environment, University College London, Gower St, London WC1E 6BT, United Kingdom
- Department of Computer Science, University College London, Gower St, London WC1E 6BT, United Kingdom
| |
Collapse
|
6
|
Protein ontology on the semantic web for knowledge discovery. Sci Data 2020; 7:337. [PMID: 33046717 PMCID: PMC7550340 DOI: 10.1038/s41597-020-00679-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 09/17/2020] [Indexed: 11/26/2022] Open
Abstract
The Protein Ontology (PRO) provides an ontological representation of protein-related entities, ranging from protein families to proteoforms to complexes. Protein Ontology Linked Open Data (LOD) exposes, shares, and connects knowledge about protein-related entities on the Semantic Web using Resource Description Framework (RDF), thus enabling integration with other Linked Open Data for biological knowledge discovery. For example, proteins (or variants thereof) can be retrieved on the basis of specific disease associations. As a community resource, we strive to follow the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles, disseminate regular updates of our data, support multiple methods for accessing, querying and downloading data in various formats, and provide documentation both for scientists and programmers. PRO Linked Open Data can be browsed via faceted browser interface and queried using SPARQL via YASGUI. RDF data dumps are also available for download. Additionally, we developed RESTful APIs to support programmatic data access. We also provide W3C HCLS specification compliant metadata description for our data. The PRO Linked Open Data is available at https://lod.proconsortium.org/.
Collapse
|
7
|
Abstract
UniProt continues to support the ongoing process of making scientific data FAIR. Here we contribute to this process with a FAIRness assessment of our UniProtKB dataset followed by a critical reflection on the challenges and future directions of the adoption and validation of the FAIR principles and metrics.
Collapse
|
8
|
Trifan A, Oliveira JL. Patient data discovery platforms as enablers of biomedical and translational research: A systematic review. J Biomed Inform 2019; 93:103154. [PMID: 30922867 DOI: 10.1016/j.jbi.2019.103154] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Revised: 03/15/2019] [Accepted: 03/18/2019] [Indexed: 11/28/2022]
Abstract
BACKGROUND The global shift from paper health records to electronic ones has led to an impressive growth of biomedical digital data along the past two decades. Exploring and extracting knowledge from these data has the potential to enhance translational research and lead to positive outcomes for the population's health and healthcare. OBECTIVE The aim of this study was to conduct a systematic review to identify software platforms that enable discovery, secondary use and interoperability of biomedical data. Additionally, we aim evaluating the identified solutions in terms of clinical interest and main healthcare-related outcomes. METHODS A systematic search of the scientific literature published and indexed in Pubmed between January 2014 and September 2018 was performed. Inclusion criteria were as follows: relevance for the topic of biomedical data discovery, English language, and free full text. To increase the recall, we developed a semi-automatic and incremental methodology to retrieve articles that cite one or more of the previous set. RESULTS A total number of 500 candidate papers were retrieved through this methodology. Of these, 85 were eligible for abstract assessment. Finally, 37 studies qualified for a full-text review, and 20 provided enough information for the study objectives. CONCLUSIONS This study revealed that biomedical discovery platforms are both a current necessity and a significantly innovative agent in the area of healthcare. The outcomes that were identified, in terms of scientific publications, clinical studies and research collaborations stand as evidence.
Collapse
|