Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chen B, Dong X, Jiao D, Wang H, Zhu Q, Ding Y, Wild DJ. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics 2010;11:255. [PMID: 20478034 PMCID: PMC2881087 DOI: 10.1186/1471-2105-11-255] [Citation(s) in RCA: 156] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Accepted: 05/17/2010] [Indexed: 11/24/2022] Open

For:	Chen B, Dong X, Jiao D, Wang H, Zhu Q, Ding Y, Wild DJ. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics 2010;11:255. [PMID: 20478034 PMCID: PMC2881087 DOI: 10.1186/1471-2105-11-255] [Citation(s) in RCA: 156] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2009] [Accepted: 05/17/2010] [Indexed: 11/24/2022] Open

Number

Cited by Other Article(s)

Semantic Data Visualisation for Biomedical Database Catalogues. Healthcare (Basel) 2022;10:healthcare10112287. [DOI: 10.3390/healthcare10112287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/08/2022] [Accepted: 11/10/2022] [Indexed: 11/16/2022] Open

Ebeid IA. MedGraph: A semantic biomedical information retrieval framework using knowledge graph embedding for PubMed. Front Big Data 2022;5:965619. [PMID: 36338335 PMCID: PMC9627348 DOI: 10.3389/fdata.2022.965619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 09/20/2022] [Indexed: 01/24/2023] Open

Yang JJ, Gessner CR, Duerksen JL, Biber D, Binder JL, Ozturk M, Foote B, McEntire R, Stirling K, Ding Y, Wild DJ. Knowledge graph analytics platform with LINCS and IDG for Parkinson's disease target illumination. BMC Bioinformatics 2022;23:37. [PMID: 35021991 PMCID: PMC8756622 DOI: 10.1186/s12859-021-04530-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 12/13/2021] [Indexed: 11/12/2022] Open

Abstract

Background

LINCS, "Library of Integrated Network-based Cellular Signatures", and IDG, "Illuminating the Druggable Genome", are both NIH projects and consortia that have generated rich datasets for the study of the molecular basis of human health and disease. LINCS L1000 expression signatures provide unbiased systems/omics experimental evidence. IDG provides compiled and curated knowledge for illumination and prioritization of novel drug target hypotheses. Together, these resources can support a powerful new approach to identifying novel drug targets for complex diseases, such as Parkinson's disease (PD), which continues to inflict severe harm on human health, and resist traditional research approaches.

Results

Integrating LINCS and IDG, we built the Knowledge Graph Analytics Platform (KGAP) to support an important use case: identification and prioritization of drug target hypotheses for associated diseases. The KGAP approach includes strong semantics interpretable by domain scientists and a robust, high performance implementation of a graph database and related analytical methods. Illustrating the value of our approach, we investigated results from queries relevant to PD. Approved PD drug indications from IDG’s resource DrugCentral were used as starting points for evidence paths exploring chemogenomic space via LINCS expression signatures for associated genes, evaluated as target hypotheses by integration with IDG. The KG-analytic scoring function was validated against a gold standard dataset of genes associated with PD as elucidated, published mechanism-of-action drug targets, also from DrugCentral. IDG's resource TIN-X was used to rank and filter KGAP results for novel PD targets, and one, SYNGR3 (Synaptogyrin-3), was manually investigated further as a case study and plausible new drug target for PD.

Conclusions

The synergy of LINCS and IDG, via KG methods, empowers graph analytics methods for the investigation of the molecular basis of complex diseases, and specifically for identification and prioritization of novel drug targets. The KGAP approach enables downstream applications via integration with resources similarly aligned with modern KG methodology. The generality of the approach indicates that KGAP is applicable to many disease areas, in addition to PD, the focus of this paper.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04530-9.

Collapse

Shaker B, Ahmad S, Lee J, Jung C, Na D. In silico methods and tools for drug discovery. Comput Biol Med 2021;137:104851. [PMID: 34520990 DOI: 10.1016/j.compbiomed.2021.104851] [Citation(s) in RCA: 127] [Impact Index Per Article: 42.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/05/2021] [Accepted: 09/05/2021] [Indexed: 12/28/2022]

Moon C, Jin C, Dong X, Abrar S, Zheng W, Chirkova RY, Tropsha A. Learning Drug-Disease-Target Embedding (DDTE) from knowledge graphs to inform drug repurposing hypotheses. J Biomed Inform 2021;119:103838. [PMID: 34119691 DOI: 10.1016/j.jbi.2021.103838] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 05/10/2021] [Accepted: 06/08/2021] [Indexed: 10/21/2022]

Bresso E, Monnin P, Bousquet C, Calvier FE, Ndiaye NC, Petitpain N, Smaïl-Tabbone M, Coulet A. Investigating ADR mechanisms with Explainable AI: a feasibility study with knowledge graph mining. BMC Med Inform Decis Mak 2021;21:171. [PMID: 34039343 PMCID: PMC8157660 DOI: 10.1186/s12911-021-01518-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 05/05/2021] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

Adverse drug reactions (ADRs) are statistically characterized within randomized clinical trials and postmarketing pharmacovigilance, but their molecular mechanism remains unknown in most cases. This is true even for hepatic or skin toxicities, which are classically monitored during drug design. Aside from clinical trials, many elements of knowledge about drug ingredients are available in open-access knowledge graphs, such as their properties, interactions, or involvements in pathways. In addition, drug classifications that label drugs as either causative or not for several ADRs, have been established.

METHODS

We propose in this paper to mine knowledge graphs for identifying biomolecular features that may enable automatically reproducing expert classifications that distinguish drugs causative or not for a given type of ADR. In an Explainable AI perspective, we explore simple classification techniques such as Decision Trees and Classification Rules because they provide human-readable models, which explain the classification itself, but may also provide elements of explanation for molecular mechanisms behind ADRs. In summary, (1) we mine a knowledge graph for features; (2) we train classifiers at distinguishing, on the basis of extracted features, drugs associated or not with two commonly monitored ADRs: drug-induced liver injuries (DILI) and severe cutaneous adverse reactions (SCAR); (3) we isolate features that are both efficient in reproducing expert classifications and interpretable by experts (i.e., Gene Ontology terms, drug targets, or pathway names); and (4) we manually evaluate in a mini-study how they may be explanatory.

RESULTS

Extracted features reproduce with a good fidelity classifications of drugs causative or not for DILI and SCAR (Accuracy = 0.74 and 0.81, respectively). Experts fully agreed that 73% and 38% of the most discriminative features are possibly explanatory for DILI and SCAR, respectively; and partially agreed (2/3) for 90% and 77% of them.

CONCLUSION

Knowledge graphs provide sufficiently diverse features to enable simple and explainable models to distinguish between drugs that are causative or not for ADRs. In addition to explaining classifications, most discriminative features appear to be good candidates for investigating ADR mechanisms further.

Collapse

Galgonek J, Vondrášek J. IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminform 2021;13:38. [PMID: 33980298 PMCID: PMC8117646 DOI: 10.1186/s13321-021-00515-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 04/23/2021] [Indexed: 11/12/2022] Open

Abstract

The Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.

Collapse

MacLean F. Knowledge graphs and their applications in drug discovery. Expert Opin Drug Discov 2021;16:1057-1069. [PMID: 33843398 DOI: 10.1080/17460441.2021.1910673] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

CMG2Vec: A composite meta-graph based heterogeneous information network embedding approach. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106661] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Knowledge-Graph-Based Drug Repositioning against COVID-19 by Graph Convolutional Network with Attention Mechanism. FUTURE INTERNET 2021. [DOI: 10.3390/fi13010013] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open

Kanza S, Graham Frey J. Semantic Technologies in Drug Discovery. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open

Jarada TN, Rokne JG, Alhajj R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J Cheminform 2020;12:46. [PMID: 33431024 PMCID: PMC7374666 DOI: 10.1186/s13321-020-00450-7] [Citation(s) in RCA: 138] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 07/13/2020] [Indexed: 01/13/2023] Open

Li X, Rousseau JF, Ding Y, Song M, Lu W. Understanding Drug Repurposing From the Perspective of Biomedical Entities and Their Evolution: Bibliographic Research Using Aspirin. JMIR Med Inform 2020;8:e16739. [PMID: 32543442 PMCID: PMC7327595 DOI: 10.2196/16739] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 01/08/2020] [Accepted: 03/31/2020] [Indexed: 12/26/2022] Open

Abstract

BACKGROUND

Drug development is still a costly and time-consuming process with a low rate of success. Drug repurposing (DR) has attracted significant attention because of its significant advantages over traditional approaches in terms of development time, cost, and safety. Entitymetrics, defined as bibliometric indicators based on biomedical entities (eg, diseases, drugs, and genes) studied in the biomedical literature, make it possible for researchers to measure knowledge evolution and the transfer of drug research.

OBJECTIVE

The purpose of this study was to understand DR from the perspective of biomedical entities (diseases, drugs, and genes) and their evolution.

METHODS

In the work reported in this paper, we extended the bibliometric indicators of biomedical entities mentioned in PubMed to detect potential patterns of biomedical entities in various phases of drug research and investigate the factors driving DR. We used aspirin (acetylsalicylic acid) as the subject of the study since it can be repurposed for many applications. We propose 4 easy, transparent measures based on entitymetrics to investigate DR for aspirin: Popularity Index (P₁), Promising Index (P₂), Prestige Index (P₃), and Collaboration Index (CI).

RESULTS

We found that the maxima of P₁, P₃, and CI are closely associated with the different repurposing phases of aspirin. These metrics enabled us to observe the way in which biomedical entities interacted with the drug during the various phases of DR and to analyze the potential driving factors for DR at the entity level. P₁ and CI were indicative of the dynamic trends of a specific biomedical entity over a long time period, while P₂ was more sensitive to immediate changes. P₃ reflected the early signs of the practical value of biomedical entities and could be valuable for tracking the research frontiers of a drug.

CONCLUSIONS

In-depth studies of side effects and mechanisms, fierce market competition, and advanced life science technologies are driving factors for DR. This study showcases the way in which researchers can examine the evolution of DR using entitymetrics, an approach that can be valuable for enhancing decision making in the field of drug discovery and development.

Collapse

Zhao T, Hu Y, Valsdottir LR, Zang T, Peng J. Identifying drug-target interactions based on graph convolutional network and deep neural network. Brief Bioinform 2020;22:2141-2150. [PMID: 32367110 DOI: 10.1093/bib/bbaa044] [Citation(s) in RCA: 117] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 03/05/2020] [Accepted: 03/06/2020] [Indexed: 12/21/2022] Open

Southan C. Opening up connectivity between documents, structures and bioactivity. Beilstein J Org Chem 2020;16:596-606. [PMID: 32280387 PMCID: PMC7136548 DOI: 10.3762/bjoc.16.54] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open

Abstract

Bioscientists reading papers or patents strive to discern the key relationships reported within a document "D" where a bioactivity "A" with a quantitative result "R" (e.g., an IC₅₀) is reported for chemical structure "C" that modulates (e.g., inhibits) a protein target "P". A useful shorthand for this connectivity thus becomes DARCP. The problem at the core of this article is that the community has spent millions effectively burying these relationships in PDFs over many decades but must now spend millions more trying to get them back out. The key imperative for this is to increase the flow into structured open databases. The positive impacts will include expanded data mining opportunities for drug discovery and chemical biology. Over the last decade commercial sources have manually extracted DARCP from ≈300,000 documents encompassing ≈7 million compounds interacting with ≈10,000 targets. Over a similar time, the Guide to Pharmacology, BindingDB and ChEMBL have carried out analogues DARCP extractions. Although their expert-curated numbers are lower (i.e., ≈2 million compounds against ≈3700 human proteins), these open sources have the great advantage of being merged within PubChem. Parallel efforts have focused on the extraction of document-to-compound (D-C-only) connectivity. In the absence of molecular mechanism of action (mmoa) annotation, this is of less value but can be automatically extracted. This has been significantly accomplished for patents, (e.g., by IBM, SureChEMBL and WIPO) for over 30 million compounds in PubChem. These have recently been joined by 1.4 million D-C submissions from three major chemistry publishers. In addition, both the European and US PubMed Central portals now add chemistry look-ups from abstracts and full-text papers. However, the fully automated extraction of DARCLP has not yet been achieved. This stands in contrast to the ability of biocurators to discern these relationships in minutes. Unfortunately, no journals have yet instigated a flow of author-specified DARCP directly into open databases. Progress may come from trends such as open science, open access (OA), findable, accessible, interoperable and reusable (FAIR), resource description framework (RDF) and WikiData. However, we will need to await the technical applicability in respect to DARCP capture to see if this opens up connectivity.

Collapse

Bizon C, Cox S, Balhoff J, Kebede Y, Wang P, Morton K, Fecho K, Tropsha A. ROBOKOP KG and KGB: Integrated Knowledge Graphs from Federated Sources. J Chem Inf Model 2019;59:4968-4973. [DOI: 10.1021/acs.jcim.9b00683] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Kumar R, Harilal S, Gupta SV, Jose J, Thomas Parambi DG, Uddin MS, Shah MA, Mathew B. Exploring the new horizons of drug repurposing: A vital tool for turning hard work into smart work. Eur J Med Chem 2019;182:111602. [PMID: 31421629 PMCID: PMC7127402 DOI: 10.1016/j.ejmech.2019.111602] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/07/2019] [Accepted: 08/07/2019] [Indexed: 02/07/2023]

Li D, Madden A. Cascade embedding model for knowledge graph inference and retrieval. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2019.102093] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Liang X, Li D, Song M, Madden A, Ding Y, Bu Y. Predicting biomedical relationships using the knowledge and graph embedding cascade model. PLoS One 2019;14:e0218264. [PMID: 31194807 PMCID: PMC6565371 DOI: 10.1371/journal.pone.0218264] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 05/29/2019] [Indexed: 02/06/2023] Open

Abstract

Advances in machine learning and deep learning methods, together with the increasing availability of large-scale pharmacological, genomic, and chemical datasets, have created opportunities for identifying potentially useful relationships within biochemical networks. Knowledge embedding models have been found to have value in detecting knowledge-based correlations among entities, but little effort has been made to apply them to networks of biochemical entities. This is because such networks tend to be unbalanced and sparse, and knowledge embedding models do not work well on them. However, to some extent, the shortcomings of knowledge embedding models can be compensated for if they are used in association with graph embedding. In this paper, we combine knowledge embedding and graph embedding to represent biochemical entities and their relations as dense and low-dimensional vectors. We build a cascade learning framework which incorporates semantic features from the knowledge embedding model, and graph features from the graph embedding model, to score the probability of linking. The proposed method performs noticeably better than the models with which it is compared. It predicted links and entities with an accuracy of 93%, and its average hits@10 score has an average of 8.6% absolute improvement compared with original knowledge embedding model, 1.1% to 9.7% absolute improvement compared with other knowledge and graph embedding algorithm. In addition, we designed a meta-path algorithm to detect path relations in the biomedical network. Case studies further verify the value of the proposed model in finding potential relationships between diseases, drugs, genes, treatments, etc. Amongst the findings of the proposed model are the suggestion that VDR (vitamin D receptor) may be linked to prostate cancer. This is backed by evidence from medical databases and published research, supporting the suggestion that our proposed model could be of value to biomedical researchers.

Collapse

Gao Z, Fu G, Ouyang C, Tsutsui S, Liu X, Yang J, Gessner C, Foote B, Wild D, Ding Y, Yu Q. edge2vec: Representation learning using edge semantics for biomedical knowledge discovery. BMC Bioinformatics 2019;20:306. [PMID: 31238875 PMCID: PMC6593489 DOI: 10.1186/s12859-019-2914-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 05/24/2019] [Indexed: 11/23/2022] Open

Auto-Generated Physiological Chain Data for an Ontological Framework for Pharmacology and Mechanism of Action to Determine Suspected Drugs in Cases of Dysuria. Drug Saf 2019;42:1055-1069. [PMID: 31119651 DOI: 10.1007/s40264-019-00833-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Oulas A, Minadakis G, Zachariou M, Sokratous K, Bourdakou MM, Spyrou GM. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches. Brief Bioinform 2019;20:806-824. [PMID: 29186305 PMCID: PMC6585387 DOI: 10.1093/bib/bbx151] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/17/2017] [Indexed: 02/01/2023] Open

Ciallella HL, Zhu H. Advancing Computational Toxicology in the Big Data Era by Artificial Intelligence: Data-Driven and Mechanism-Driven Modeling for Chemical Toxicity. Chem Res Toxicol 2019;32:536-547. [PMID: 30907586 DOI: 10.1021/acs.chemrestox.8b00393] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Kanza S, Frey JG. A new wave of innovation in Semantic web tools for drug discovery. Expert Opin Drug Discov 2019;14:433-444. [DOI: 10.1080/17460441.2019.1586880] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Zheng S, Dharssi S, Wu M, Li J, Lu Z. Text Mining for Drug Discovery. Methods Mol Biol 2019;1939:231-252. [PMID: 30848465 DOI: 10.1007/978-1-4939-9089-4_13] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Byrne R, Schneider G. In Silico Target Prediction for Small Molecules. Methods Mol Biol 2019;1888:273-309. [PMID: 30519953 DOI: 10.1007/978-1-4939-8891-4_16] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Luechtefeld T, Hartung T. Computational approaches to chemical hazard assessment. ALTEX-ALTERNATIVES TO ANIMAL EXPERIMENTATION 2018;34:459-478. [PMID: 29101769 PMCID: PMC5848496 DOI: 10.14573/altex.1710141] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Indexed: 01/10/2023]

Zhou Y, Huang J, Li H, Sun H, Peng Y, Xu Y. A semantic-rich similarity measure in heterogeneous information networks. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.05.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Hu W, Qiu H, Huang J, Dumontier M. BioSearch: a semantic search engine for Bio2RDF. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018;2017:4079799. [PMID: 29220451 PMCID: PMC5569678 DOI: 10.1093/database/bax059] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Accepted: 07/10/2017] [Indexed: 12/14/2022]

Xue H, Li J, Xie H, Wang Y. Review of Drug Repositioning Approaches and Resources. Int J Biol Sci 2018;14:1232-1244. [PMID: 30123072 PMCID: PMC6097480 DOI: 10.7150/ijbs.24612] [Citation(s) in RCA: 314] [Impact Index Per Article: 52.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 06/12/2018] [Indexed: 12/23/2022] Open

Chen X, Gururaj AE, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim HE, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone SA, Fore IM, Ohno-Machado L, Grethe JS, Xu H. DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 2018;25:300-308. [PMID: 29346583 PMCID: PMC7378878 DOI: 10.1093/jamia/ocx121] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 09/20/2017] [Accepted: 09/28/2017] [Indexed: 12/17/2022] Open

Affiliation(s)

Xiaoling Chen School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Anupama E Gururaj School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Burak Ozyurt Center for Research in Biological Systems
Ruiling Liu School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Ergin Soysal School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Trevor Cohen School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Firat Tiryaki School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Yueling Li Center for Research in Biological Systems
Nansu Zong Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
Min Jiang School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Deevakar Rogith School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Mandana Salimi School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Hyeon-Eui Kim Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
Philippe Rocca-Serra e-Research Centre, University of Oxford, Oxford, UK
Alejandra Gonzalez-Beltran e-Research Centre, University of Oxford, Oxford, UK
Claudiu Farcas Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
Todd Johnson School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
Ron Margolis National Institutes of Health, Bethesda, MD, USA
George Alter University of Michigan, Ann Arbor, MI, USA
Susanna-Assunta Sansone e-Research Centre, University of Oxford, Oxford, UK
Ian M Fore National Institutes of Health, Bethesda, MD, USA
Lucila Ohno-Machado Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
Jeffrey S Grethe Center for Research in Biological Systems
Hua Xu School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA

Collapse

Liu J, Ning X. Differential Compound Prioritization via Bidirectional Selectivity Push with Power. J Chem Inf Model 2017;57:2958-2975. [PMID: 29178784 DOI: 10.1021/acs.jcim.7b00552] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Sam E, Athri P. Web-based drug repurposing tools: a survey. Brief Bioinform 2017;20:299-316. [DOI: 10.1093/bib/bbx125] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Indexed: 12/15/2022] Open

Bolgár B, Antal P. VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization. BMC Bioinformatics 2017;18:440. [PMID: 28978313 PMCID: PMC5628496 DOI: 10.1186/s12859-017-1845-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 09/21/2017] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance.

METHOD

We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions.

RESULTS

VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of "small sample size" regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time.

CONCLUSION

In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions.

Collapse

Djokic-Petrovic M, Cvjetkovic V, Yang J, Zivanovic M, Wild DJ. PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets. J Biomed Semantics 2017;8:42. [PMID: 28931422 PMCID: PMC5607505 DOI: 10.1186/s13326-017-0151-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 09/12/2017] [Indexed: 01/08/2023] Open

Abstract

BACKGROUND

There are a huge variety of data sources relevant to chemical, biological and pharmacological research, but these data sources are highly siloed and cannot be queried together in a straightforward way. Semantic technologies offer the ability to create links and mappings across datasets and manage them as a single, linked network so that searching can be carried out across datasets, independently of the source. We have developed an application called PIBAS FedSPARQL that uses semantic technologies to allow researchers to carry out such searching across a vast array of data sources.

RESULTS

PIBAS FedSPARQL is a web-based query builder and result set visualizer of bioinformatics data. As an advanced feature, our system can detect similar data items identified by different Uniform Resource Identifiers (URIs), using a text-mining algorithm based on the processing of named entities to be used in Vector Space Model and Cosine Similarity Measures. According to our knowledge, PIBAS FedSPARQL was unique among the systems that we found in that it allows detecting of similar data items. As a query builder, our system allows researchers to intuitively construct and run Federated SPARQL queries across multiple data sources, including global initiatives, such as Bio2RDF, Chem2Bio2RDF, EMBL-EBI, and one local initiative called CPCTAS, as well as additional user-specified data source. From the input topic, subtopic, template and keyword, a corresponding initial Federated SPARQL query is created and executed. Based on the data obtained, end users have the ability to choose the most appropriate data sources in their area of interest and exploit their Resource Description Framework (RDF) structure, which allows users to select certain properties of data to enhance query results.

CONCLUSIONS

The developed system is flexible and allows intuitive creation and execution of queries for an extensive range of bioinformatics topics. Also, the novel "similar data items detection" algorithm can be particularly useful for suggesting new data sources and cost optimization for new experiments. PIBAS FedSPARQL can be expanded with new topics, subtopics and templates on demand, rendering information retrieval more robust.

Collapse

Van Den Driessche G, Fourches D. Adverse drug reactions triggered by the common HLA-B*57:01 variant: a molecular docking study. J Cheminform 2017;9:13. [PMID: 28303164 PMCID: PMC5337232 DOI: 10.1186/s13321-017-0202-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 02/24/2017] [Indexed: 12/18/2022] Open

Abstract

BACKGROUND

Human leukocyte antigen (HLA) surface proteins are directly involved in idiosyncratic adverse drug reactions. Herein, we present a structure-based analysis of the common HLA-B*57:01 variant known to be responsible for several HLA-linked adverse effects such as the abacavir hypersensitivity syndrome.

METHODS

First, we analyzed three X-ray crystal structures involving the HLA-B*57:01 protein variant, the anti-HIV drug abacavir, and different co-binding peptides present in the antigen-binding cleft. We superimposed the three complexes and showed that abacavir had no significant conformational variation whatever the co-binding peptide. Second, we self-docked abacavir in the HLA-B*57:01 antigen binding cleft with and without peptide using Glide. Third, we docked a small test set of 13 drugs with known ADRs and suspected HLA associations.

RESULTS

In the presence of an endogenous co-binding peptide, we found a significant stabilization (~2 kcal/mol) of the docking scores and identified several modified abacavir-peptide interactions indicating that the peptide does play a role in stabilizing the HLA-abacavir complex. Next, our model was used to dock a test set of 13 drugs at HLA-B*57:01 and measured their predicted binding affinities. Drug-specific interactions were observed at the antigen-binding cleft and we were able to discriminate the compounds with known HLA-B*57:01 liability from inactives.

CONCLUSIONS

Overall, our study highlights the relevance of molecular docking for evaluating and analyzing complex HLA-drug interactions. This is particularly important for virtual drug screening over thousands of HLA variants as other experimental techniques (e.g., in vitro HTS) and computational approaches (e.g., molecular dynamics) are more time consuming and expensive to conduct. As the attention for drugs' HLA liability is on the rise, we believe this work participates in encouraging the use of molecular modeling for reliably studying and predicting HLA-drug interactions. Graphical abstract.

Collapse

Paul Rupa A, Singh S, Zhu Q. GT2RDF: Semantic Representation of Genetic Testing Data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017;2016:1060-1069. [PMID: 28269903 PMCID: PMC5333271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Abstract

Accelerated by the Human Genome Project, genetic testing has become an increasingly integral component in diagnosis, treatment, management, and prevention of numerous diseases and conditions. More than 480 laboratories perform genetic tests for more than 4,600 rare and common medical conditions. These tests can effectively help health professionals to determine or predict the genetic conditions of their patients. However, physicians have not actively incorporated such innovative genetic technology into their clinical practices according to two national wide surveys commissioned by UnitedHealth Group. To fill the gap of insufficient use of a large number of genetic tests, we generated a single Resource Description Framework (RDF) resource, called GT2RDF (Genetic Testing data to RDF) by integrating information about disease, gene, phenotype, genetic test, and drug from multiple sources including Genetic Testing Registry (GTR), Online Mendelian Inheritance in Man (OMIM), MedGen, Human Phenotype Ontology (HPO), ClinVar, National Drug File Reference Terminology (NDF-RT). Meanwhile, we manually annotated and extracted information from 200 randomly selected GeneReviews chapters, and integrated into the GT2RDF. We performed two case studies to demonstrate the usability of the GT2RDF. GT2RDF will serve as a data foundation to support the design of a genetic testing recommendation system, called iGenetics, which will ultimately facilitate the pace of precision medicine by means of actively and effectively incorporating innovative genetic technology in clinical settings. Abbreviations: GT2RDF: Genetic Testing data to RDF; SWT: Semantic web technology; OWL: Ontology Web Language; RDF: Resource Description Framework; SPARQL: SPARQL Protocol and RDF Query Language; GTR: Genetic Testing Registry; OMIM: Online Mendelian Inheritance in Man; HPO: Human Phenotype Ontology; NDF-RT: National Drug File Reference Terminology; UMLS: Unified Medical Language System.

Collapse

Luo Y, Uzuner Ö, Szolovits P. Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations. Brief Bioinform 2017;18:160-178. [PMID: 26851224 PMCID: PMC5221425 DOI: 10.1093/bib/bbw001] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 11/29/2015] [Indexed: 01/18/2023] Open

Stuermer M, Abu-Tayeh G, Myrach T. Digital sustainability: basic conditions for sustainable digital artifacts and their ecosystems. SUSTAINABILITY SCIENCE 2016;12:247-262. [PMID: 30174752 PMCID: PMC6106115 DOI: 10.1007/s11625-016-0412-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 11/05/2016] [Indexed: 06/01/2023]

Shen F, Lee Y. Knowledge Discovery from Biomedical Ontologies in Cross Domains. PLoS One 2016;11:e0160005. [PMID: 27548262 PMCID: PMC4993478 DOI: 10.1371/journal.pone.0160005] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 07/12/2016] [Indexed: 01/19/2023] Open

Iyappan A, Kawalia SB, Raschka T, Hofmann-Apitius M, Senger P. NeuroRDF: semantic integration of highly curated data to prioritize biomarker candidates in Alzheimer's disease. J Biomed Semantics 2016;7:45. [PMID: 27392431 PMCID: PMC4939021 DOI: 10.1186/s13326-016-0079-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 05/23/2016] [Indexed: 01/25/2023] Open

Abstract

BACKGROUND

Neurodegenerative diseases are incurable and debilitating indications with huge social and economic impact, where much is still to be learnt about the underlying molecular events. Mechanistic disease models could offer a knowledge framework to help decipher the complex interactions that occur at molecular and cellular levels. This motivates the need for the development of an approach integrating highly curated and heterogeneous data into a disease model of different regulatory data layers. Although several disease models exist, they often do not consider the quality of underlying data. Moreover, even with the current advancements in semantic web technology, we still do not have cure for complex diseases like Alzheimer's disease. One of the key reasons accountable for this could be the increasing gap between generated data and the derived knowledge.

RESULTS

In this paper, we describe an approach, called as NeuroRDF, to develop an integrative framework for modeling curated knowledge in the area of complex neurodegenerative diseases. The core of this strategy lies in the usage of well curated and context specific data for integration into one single semantic web-based framework, RDF. This increases the probability of the derived knowledge to be novel and reliable in a specific disease context. This infrastructure integrates highly curated data from databases (Bind, IntAct, etc.), literature (PubMed), and gene expression resources (such as GEO and ArrayExpress). We illustrate the effectiveness of our approach by asking real-world biomedical questions that link these resources to prioritize the plausible biomarker candidates. Among the 13 prioritized candidate genes, we identified MIF to be a potential emerging candidate due to its role as a pro-inflammatory cytokine. We additionally report on the effort and challenges faced during generation of such an indication-specific knowledge base comprising of curated and quality-controlled data.

CONCLUSION

Although many alternative approaches have been proposed and practiced for modeling diseases, the semantic web technology is a flexible and well established solution for harmonized aggregation. The benefit of this work, to use high quality and context specific data, becomes apparent in speculating previously unattended biomarker candidates around a well-known mechanism, further leveraged for experimental investigations.

Collapse

Kaalia R, Ghosh I. Semantics based approach for analyzing disease-target associations. J Biomed Inform 2016;62:125-35. [PMID: 27349858 DOI: 10.1016/j.jbi.2016.06.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 06/23/2016] [Accepted: 06/24/2016] [Indexed: 12/16/2022]

Galgonek J, Hurt T, Michlíková V, Onderka P, Schwarz J, Vondrášek J. Advanced SPARQL querying in small molecule databases. J Cheminform 2016;8:31. [PMID: 27275187 PMCID: PMC4893829 DOI: 10.1186/s13321-016-0144-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 05/25/2016] [Indexed: 11/14/2022] Open

Shen F, Liu H, Sohn S, Larson DW, Lee Y. Predicate Oriented Pattern Analysis for Biomedical Knowledge Discovery. INTELLIGENT INFORMATION MANAGEMENT 2016;8:66-85. [PMID: 28983419 PMCID: PMC5626454 DOI: 10.4236/iim.2016.83006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Predicting drug target interactions using meta-path-based semantic network analysis. BMC Bioinformatics 2016;17:160. [PMID: 27071755 PMCID: PMC4830032 DOI: 10.1186/s12859-016-1005-x] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 03/31/2016] [Indexed: 12/02/2022] Open

Abstract

Background

In the context of drug discovery, drug target interactions (DTIs) can be predicted based on observed topological features of a semantic network across the chemical and biological space. In a semantic network, the types of the nodes and links are different. In order to take into account the heterogeneity of the semantic network, meta-path-based topological patterns were investigated for link prediction.

Results

Supervised machine learning models were constructed based on meta-path topological features of an enriched semantic network, which was derived from Chem2Bio2RDF, and was expanded by adding compound and protein similarity neighboring links obtained from the PubChem databases. The additional semantic links significantly improved the predictive performance of the supervised learning models. The binary classification model built upon the enriched feature space using the Random Forest algorithm significantly outperformed an existing semantic link prediction algorithm, Semantic Link Association Prediction (SLAP), to predict unknown links between compounds and protein targets in an evolving network. In addition to link prediction, Random Forest also has an intrinsic feature ranking algorithm, which can be used to select the important topological features that contribute to link prediction.

Conclusions

The proposed framework has been demonstrated as a powerful alternative to SLAP in order to predict DTIs using the semantic network that integrates chemical, pharmacological, genomic, biological, functional, and biomedical information into a unified framework. It offers the flexibility to enrich the feature space by using different normalization processes on the topological features, and it can perform model construction and feature selection at the same time.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1005-x) contains supplementary material, which is available to authorized users.

Collapse

Swainston N, Hastings J, Dekker A, Muthukrishnan V, May J, Steinbeck C, Mendes P. libChEBI: an API for accessing the ChEBI database. J Cheminform 2016;8:11. [PMID: 26933452 PMCID: PMC4772646 DOI: 10.1186/s13321-016-0123-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 02/16/2016] [Indexed: 01/29/2023] Open

Abstract

Background

ChEBI is a database and ontology of chemical entities of biological interest. It is widely used as a source of identifiers to facilitate unambiguous reference to chemical entities within biological models, databases, ontologies and literature. ChEBI contains a wealth of chemical data, covering over 46,500 distinct chemical entities, and related data such as chemical formula, charge, molecular mass, structure, synonyms and links to external databases. Furthermore, ChEBI is an ontology, and thus provides meaningful links between chemical entities. Unlike many other resources, ChEBI is fully human-curated, providing a reliable, non-redundant collection of chemical entities and related data. While ChEBI is supported by a web service for programmatic access and a number of download files, it does not have an API library to facilitate the use of ChEBI and its data in cheminformatics software.

Results

To provide this missing functionality, libChEBI, a comprehensive API library for accessing ChEBI data, is introduced. libChEBI is available in Java, Python and MATLAB versions from http://github.com/libChEBI, and provides full programmatic access to all data held within the ChEBI database through a simple and documented API. libChEBI is reliant upon the (automated) download and regular update of flat files that are held locally. As such, libChEBI can be embedded in both on- and off-line software applications.

Conclusions

libChEBI allows better support of ChEBI and its data in the development of new cheminformatics software. Covering three key programming languages, it allows for the entirety of the ChEBI database to be accessed easily and quickly through a simple API. All code is open access and freely available.

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-016-0123-9) contains supplementary material, which is available to authorized users.

Collapse

Hu B, Gifford E, Wang H, Bailey W, Johnson T. Analysis of the ToxCast Chemical-Assay Space Using the Comparative Toxicogenomics Database. Chem Res Toxicol 2015;28:2210-23. [PMID: 26505644 DOI: 10.1021/acs.chemrestox.5b00369] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Optimizing drug-target interaction prediction based on random walk on heterogeneous networks. J Cheminform 2015;7:40. [PMID: 26300984 PMCID: PMC4540752 DOI: 10.1186/s13321-015-0089-z] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2014] [Accepted: 07/13/2015] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Predicting novel drug-target associations is important not only for developing new drugs, but also for furthering biological knowledge by understanding how drugs work and their modes of action. As more data about drugs, targets, and their interactions becomes available, computational approaches have become an indispensible part of drug target association discovery. In this paper we apply random walk with restart (RWR) method to a heterogeneous network of drugs and targets compiled from DrugBank database and investigate the performance of the method under parameter variation and choice of chemical fingerprint methods.

RESULTS

We show that choice of chemical fingerprint does not affect the performance of the method when the parameters are tuned to optimal values. We use a subset of the ChEMBL15 dataset that contains 2,763 associations between 544 drugs and 467 target proteins to evaluate our method, and we extracted datasets of bioactivity ≤1 and ≤10 μM activity cutoff. For 1 μM bioactivity cutoff, we find that our method can correctly predict nearly 47, 55, 60% of the given drug-target interactions in the test dataset having more than 0, 1, 2 drug target relations for ChEMBL 1 μM dataset in top 50 rank positions. For 10 μM bioactivity cutoff, we find that our method can correctly predict nearly 32.4, 34.8, 35.3% of the given drug-target interactions in the test dataset having more than 0, 1, 2 drug target relations for ChEMBL 1 μM dataset in top 50 rank positions. We further examine the associations between 110 popular top selling drugs in 2012 and 3,519 targets and find the top ten targets for each drug.

CONCLUSIONS

We demonstrate the effectiveness and promise of the approach-RWR on heterogeneous networks using chemical features-for identifying novel drug target interactions and investigate the performance.

Collapse

Abinaya E, Narang P, Bhardwaj A. FROG - Fingerprinting Genomic Variation Ontology. PLoS One 2015;10:e0134693. [PMID: 26244889 PMCID: PMC4526677 DOI: 10.1371/journal.pone.0134693] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 07/13/2015] [Indexed: 11/19/2022] Open

Fu G, Batchelor C, Dumontier M, Hastings J, Willighagen E, Bolton E. PubChemRDF: towards the semantic annotation of PubChem compound and substance databases. J Cheminform 2015;7:34. [PMID: 26175801 PMCID: PMC4500850 DOI: 10.1186/s13321-015-0084-4] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 06/22/2015] [Indexed: 12/02/2022] Open

Abstract

Background

PubChem is an open repository for chemical structures, biological activities and biomedical annotations. Semantic Web technologies are emerging as an increasingly important approach to distribute and integrate scientific data. Exposing PubChem data to Semantic Web services may help enable automated data integration and management, as well as facilitate interoperable web applications.

Description

This work, one of a series covering the PubChemRDF project, describes an approach to translate PubChem Substance and Compound information into Resource Description Framework (RDF) format. Basic examples are provided to demonstrate its use. The aim of this effort is to provide two new primary benefits to researchers in a cost-effective manner. Firstly, we aim to remove the inherent limitations of using the web-based resource PubChem by allowing a researcher to use readily available semantic technologies (namely, RDF triple stores and their corresponding SPARQL query engines) to query and analyze PubChem data on local computing resources. Secondly, this work intends to help improve data sharing, analysis, and integration of PubChem data to resources external to NCBI and across scientific domains, by means of the association of PubChem data to existing ontological frameworks, including CHEMical INFormation ontology, Semanticscience Integrated Ontology, and others.

Conclusions

With the goal of semantically describing information available in the PubChem archive, pre-existing ontological frameworks were used, rather than creating new ones. Semantic relationships between compounds and substances, chemical descriptors associated with compounds and substances, interrelationships between chemicals, as well as provenance and attribute metadata of substances are described.

Electronic supplementary material

The online version of this article (doi:10.1186/s13321-015-0084-4) contains supplementary material, which is available to authorized users.

Collapse