1
|
Semantic Data Visualisation for Biomedical Database Catalogues. Healthcare (Basel) 2022; 10:healthcare10112287. [DOI: 10.3390/healthcare10112287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/08/2022] [Accepted: 11/10/2022] [Indexed: 11/16/2022] Open
Abstract
Biomedical databases often have restricted access policies and governance rules. Thus, an adequate description of their content is essential for researchers who wish to use them for medical research. A strategy for publishing information without disclosing patient-level data is through database fingerprinting and aggregate characterisations. However, this information is still presented in a format that makes it challenging to search, analyse, and decide on the best databases for a domain of study. Several strategies allow one to visualise and compare the characteristics of multiple biomedical databases. Our study focused on a European platform for sharing and disseminating biomedical data. We use semantic data visualisation techniques to assist in comparing descriptive metadata from several databases. The great advantage lies in streamlining the database selection process, ensuring that sensitive details are not shared. To address this goal, we have considered two levels of data visualisation, one characterising a single database and the other involving multiple databases in network-level visualisations. This study revealed the impact of the proposed visualisations and some open challenges in representing semantically annotated biomedical datasets. Identifying future directions in this scope was one of the outcomes of this work.
Collapse
|
2
|
Ebeid IA. MedGraph: A semantic biomedical information retrieval framework using knowledge graph embedding for PubMed. Front Big Data 2022; 5:965619. [PMID: 36338335 PMCID: PMC9627348 DOI: 10.3389/fdata.2022.965619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 09/20/2022] [Indexed: 01/24/2023] Open
Abstract
Here we study the semantic search and retrieval problem in biomedical digital libraries. First, we introduce MedGraph, a knowledge graph embedding-based method that provides semantic relevance retrieval and ranking for the biomedical literature indexed in PubMed. Second, we evaluate our approach using PubMed's Best Match algorithm. Moreover, we compare our method MedGraph to a traditional TF-IDF-based algorithm. Third, we use a dataset extracted from PubMed, including 30 million articles' metadata such as abstracts, author information, citation information, and extracted biological entity mentions. We pull a subset of the dataset to evaluate MedGraph using predefined queries with ground truth ranked results. To our knowledge, this technique has not been explored before in biomedical information retrieval. In addition, our results provide some evidence that semantic approaches to search and relevance in biomedical digital libraries that rely on knowledge graph modeling offer better search relevance results when compared with traditional methods in terms of objective metrics.
Collapse
|
3
|
Yang JJ, Gessner CR, Duerksen JL, Biber D, Binder JL, Ozturk M, Foote B, McEntire R, Stirling K, Ding Y, Wild DJ. Knowledge graph analytics platform with LINCS and IDG for Parkinson's disease target illumination. BMC Bioinformatics 2022; 23:37. [PMID: 35021991 PMCID: PMC8756622 DOI: 10.1186/s12859-021-04530-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 12/13/2021] [Indexed: 11/12/2022] Open
Abstract
Background LINCS, "Library of Integrated Network-based Cellular Signatures", and IDG, "Illuminating the Druggable Genome", are both NIH projects and consortia that have generated rich datasets for the study of the molecular basis of human health and disease. LINCS L1000 expression signatures provide unbiased systems/omics experimental evidence. IDG provides compiled and curated knowledge for illumination and prioritization of novel drug target hypotheses. Together, these resources can support a powerful new approach to identifying novel drug targets for complex diseases, such as Parkinson's disease (PD), which continues to inflict severe harm on human health, and resist traditional research approaches. Results Integrating LINCS and IDG, we built the Knowledge Graph Analytics Platform (KGAP) to support an important use case: identification and prioritization of drug target hypotheses for associated diseases. The KGAP approach includes strong semantics interpretable by domain scientists and a robust, high performance implementation of a graph database and related analytical methods. Illustrating the value of our approach, we investigated results from queries relevant to PD. Approved PD drug indications from IDG’s resource DrugCentral were used as starting points for evidence paths exploring chemogenomic space via LINCS expression signatures for associated genes, evaluated as target hypotheses by integration with IDG. The KG-analytic scoring function was validated against a gold standard dataset of genes associated with PD as elucidated, published mechanism-of-action drug targets, also from DrugCentral. IDG's resource TIN-X was used to rank and filter KGAP results for novel PD targets, and one, SYNGR3 (Synaptogyrin-3), was manually investigated further as a case study and plausible new drug target for PD. Conclusions The synergy of LINCS and IDG, via KG methods, empowers graph analytics methods for the investigation of the molecular basis of complex diseases, and specifically for identification and prioritization of novel drug targets. The KGAP approach enables downstream applications via integration with resources similarly aligned with modern KG methodology. The generality of the approach indicates that KGAP is applicable to many disease areas, in addition to PD, the focus of this paper. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04530-9.
Collapse
|
4
|
Shaker B, Ahmad S, Lee J, Jung C, Na D. In silico methods and tools for drug discovery. Comput Biol Med 2021; 137:104851. [PMID: 34520990 DOI: 10.1016/j.compbiomed.2021.104851] [Citation(s) in RCA: 127] [Impact Index Per Article: 42.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/05/2021] [Accepted: 09/05/2021] [Indexed: 12/28/2022]
Abstract
In the past, conventional drug discovery strategies have been successfully employed to develop new drugs, but the process from lead identification to clinical trials takes more than 12 years and costs approximately $1.8 billion USD on average. Recently, in silico approaches have been attracting considerable interest because of their potential to accelerate drug discovery in terms of time, labor, and costs. Many new drug compounds have been successfully developed using computational methods. In this review, we briefly introduce computational drug discovery strategies and outline up-to-date tools to perform the strategies as well as available knowledge bases for those who develop their own computational models. Finally, we introduce successful examples of anti-bacterial, anti-viral, and anti-cancer drug discoveries that were made using computational methods.
Collapse
Affiliation(s)
- Bilal Shaker
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Sajjad Ahmad
- Department of Health and Biological Sciences, Abasyn University, Peshawar, 25000, Pakistan
| | - Jingyu Lee
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Chanjin Jung
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea.
| |
Collapse
|
5
|
Moon C, Jin C, Dong X, Abrar S, Zheng W, Chirkova RY, Tropsha A. Learning Drug-Disease-Target Embedding (DDTE) from knowledge graphs to inform drug repurposing hypotheses. J Biomed Inform 2021; 119:103838. [PMID: 34119691 DOI: 10.1016/j.jbi.2021.103838] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 05/10/2021] [Accepted: 06/08/2021] [Indexed: 10/21/2022]
Abstract
We aimed to develop and validate a new graph embedding algorithm for embedding drug-disease-target networks to generate novel drug repurposing hypotheses. Our model denotes drugs, diseases and targets as subjects, predicates and objects, respectively. Each entity is represented by a multidimensional vector and the predicate is regarded as a translation vector from a subject to an object vectors. These vectors are optimized so that when a subject-predicate-object triple represents a known drug-disease-target relationship, the summed vector between the subject and the predicate is to be close to that of the object; otherwise, the summed vector is distant from the object. The DTINet dataset was utilized to test this algorithm and discover unknown links between drugs and diseases. In cross-validation experiments, this new algorithm outperformed the original DTINet model. The MRR (Mean Reciprocal Rank) values of our models were around 0.80 while those of the original model were about 0.70. In addition, we have identified and verified several pairs of new therapeutic relations as well as adverse effect relations that were not recorded in the original DTINet dataset. This approach showed excellent performance, and the predicted drug-disease and drug-side-effect relationships were found to be consistent with literature reports. This novel method can be used to analyze diverse types of emerging biomedical and healthcare-related knowledge graphs (KG).
Collapse
Affiliation(s)
- Changsung Moon
- Department of Computer Science, North Carolina State University, Raleigh, NC 27695, USA
| | - Chunming Jin
- BRITE Institute and Department of Pharmaceutical Sciences, College of Health and Sciences, North Carolina Central University, Durham, NC 27707, USA
| | - Xialan Dong
- BRITE Institute and Department of Pharmaceutical Sciences, College of Health and Sciences, North Carolina Central University, Durham, NC 27707, USA
| | - Saad Abrar
- Department of Computer Science, North Carolina State University, Raleigh, NC 27695, USA
| | - Weifan Zheng
- BRITE Institute and Department of Pharmaceutical Sciences, College of Health and Sciences, North Carolina Central University, Durham, NC 27707, USA; UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, NC 27599, USA.
| | - Rada Y Chirkova
- Department of Computer Science, North Carolina State University, Raleigh, NC 27695, USA.
| | - Alexander Tropsha
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, NC 27599, USA.
| |
Collapse
|
6
|
Bresso E, Monnin P, Bousquet C, Calvier FE, Ndiaye NC, Petitpain N, Smaïl-Tabbone M, Coulet A. Investigating ADR mechanisms with Explainable AI: a feasibility study with knowledge graph mining. BMC Med Inform Decis Mak 2021; 21:171. [PMID: 34039343 PMCID: PMC8157660 DOI: 10.1186/s12911-021-01518-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 05/05/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Adverse drug reactions (ADRs) are statistically characterized within randomized clinical trials and postmarketing pharmacovigilance, but their molecular mechanism remains unknown in most cases. This is true even for hepatic or skin toxicities, which are classically monitored during drug design. Aside from clinical trials, many elements of knowledge about drug ingredients are available in open-access knowledge graphs, such as their properties, interactions, or involvements in pathways. In addition, drug classifications that label drugs as either causative or not for several ADRs, have been established. METHODS We propose in this paper to mine knowledge graphs for identifying biomolecular features that may enable automatically reproducing expert classifications that distinguish drugs causative or not for a given type of ADR. In an Explainable AI perspective, we explore simple classification techniques such as Decision Trees and Classification Rules because they provide human-readable models, which explain the classification itself, but may also provide elements of explanation for molecular mechanisms behind ADRs. In summary, (1) we mine a knowledge graph for features; (2) we train classifiers at distinguishing, on the basis of extracted features, drugs associated or not with two commonly monitored ADRs: drug-induced liver injuries (DILI) and severe cutaneous adverse reactions (SCAR); (3) we isolate features that are both efficient in reproducing expert classifications and interpretable by experts (i.e., Gene Ontology terms, drug targets, or pathway names); and (4) we manually evaluate in a mini-study how they may be explanatory. RESULTS Extracted features reproduce with a good fidelity classifications of drugs causative or not for DILI and SCAR (Accuracy = 0.74 and 0.81, respectively). Experts fully agreed that 73% and 38% of the most discriminative features are possibly explanatory for DILI and SCAR, respectively; and partially agreed (2/3) for 90% and 77% of them. CONCLUSION Knowledge graphs provide sufficiently diverse features to enable simple and explainable models to distinguish between drugs that are causative or not for ADRs. In addition to explaining classifications, most discriminative features appear to be good candidates for investigating ADR mechanisms further.
Collapse
Affiliation(s)
- Emmanuel Bresso
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, France
- Centre d’Investigations Cliniques Plurithématique 1433, Inserm 1116, CHRU de Nancy, Université de Lorraine, Nancy, France
| | - Pierre Monnin
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, France
- Orange, Belfort, France
| | - Cédric Bousquet
- Service de santé publique et information médicale, CHU de Saint Etienne, Saint Etienne, France
- Sorbonne Université, Inserm, Université Paris 13, LIMICS, Paris, France
| | - François-Elie Calvier
- Service de santé publique et information médicale, CHU de Saint Etienne, Saint Etienne, France
| | | | - Nadine Petitpain
- Centre Régional de Pharmacovigilance, CHRU of Nancy, Nancy, France
| | | | - Adrien Coulet
- Université de Lorraine, CNRS, Inria, LORIA, Nancy, France
- Inria Paris, Paris, France
- Centre de Recherche des Cordeliers, INSERM, Sorbonne Université, Université de Paris, Paris, France
| |
Collapse
|
7
|
Galgonek J, Vondrášek J. IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminform 2021; 13:38. [PMID: 33980298 PMCID: PMC8117646 DOI: 10.1186/s13321-021-00515-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 04/23/2021] [Indexed: 11/12/2022] Open
Abstract
The Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic.
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry of the CAS, Flemingovo náměstí 2, 166 10, Prague 6, Czech Republic
| |
Collapse
|
8
|
Abstract
INTRODUCTION Knowledge graphs have proven to be promising systems of information storage and retrieval. Due to the recent explosion of heterogeneous multimodal data sources generated in the biomedical domain, and an industry shift toward a systems biology approach, knowledge graphs have emerged as attractive methods of data storage and hypothesis generation. AREAS COVERED In this review, the author summarizes the applications of knowledge graphs in drug discovery. They evaluate their utility; differentiating between academic exercises in graph theory, and useful tools to derive novel insights, highlighting target identification and drug repurposing as two areas showing particular promise. They provide a case study on COVID-19, summarizing the research that used knowledge graphs to identify repurposable drug candidates. They describe the dangers of degree and literature bias, and discuss mitigation strategies. EXPERT OPINION Whilst knowledge graphs and graph-based machine learning have certainly shown promise, they remain relatively immature technologies. Many popular link prediction algorithms fail to address strong biases in biomedical data, and only highlight biological associations, failing to model causal relationships in complex dynamic biological systems. These problems need to be addressed before knowledge graphs reach their true potential in drug discovery.
Collapse
Affiliation(s)
- Finlay MacLean
- Target Identification., BenevolentAI, United Kingdom of Great Britain and Northern Ireland
| |
Collapse
|
9
|
CMG2Vec: A composite meta-graph based heterogeneous information network embedding approach. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106661] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
10
|
Knowledge-Graph-Based Drug Repositioning against COVID-19 by Graph Convolutional Network with Attention Mechanism. FUTURE INTERNET 2021. [DOI: 10.3390/fi13010013] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The current global crisis caused by COVID-19 almost halted normal life in most parts of the world. Due to the long development cycle for new drugs, drug repositioning becomes an effective method of screening drugs for COVID-19. To find suitable drugs for COVID-19, we add COVID-19-related information into our medical knowledge graph and utilize a knowledge-graph-based drug repositioning method to screen potential therapeutic drugs for COVID-19. Specific steps are as follows. Firstly, the information about COVID-19 is collected from the latest published literature, and gene targets of COVID-19 are added to the knowledge graph. Then, the information of COVID-19 of the knowledge graph is extracted and a drug–disease interaction prediction model based on Graph Convolutional Network with Attention (Att-GCN) is established. Att-GCN is used to extract features from the knowledge graph and the prediction matrix reconstructed through matrix operation. We evaluate the model by predicting drugs for both ordinary diseases and COVID-19. The model can achieve area under curve (AUC) of 0.954 and area under the precise recall area curve (AUPR) of 0.851 for ordinary diseases. On the drug repositioning experiment for COVID-19, five drugs predicted by the models have proved effective in clinical treatment. The experimental results confirm that the model can predict drug–disease interaction effectively for both normal diseases and COVID-19.
Collapse
|
11
|
Kanza S, Graham Frey J. Semantic Technologies in Drug Discovery. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
12
|
Jarada TN, Rokne JG, Alhajj R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J Cheminform 2020; 12:46. [PMID: 33431024 PMCID: PMC7374666 DOI: 10.1186/s13321-020-00450-7] [Citation(s) in RCA: 138] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Accepted: 07/13/2020] [Indexed: 01/13/2023] Open
Abstract
Drug repositioning is the process of identifying novel therapeutic potentials for existing drugs and discovering therapies for untreated diseases. Drug repositioning, therefore, plays an important role in optimizing the pre-clinical process of developing novel drugs by saving time and cost compared to the traditional de novo drug discovery processes. Since drug repositioning relies on data for existing drugs and diseases the enormous growth of publicly available large-scale biological, biomedical, and electronic health-related data along with the high-performance computing capabilities have accelerated the development of computational drug repositioning approaches. Multidisciplinary researchers and scientists have carried out numerous attempts, with different degrees of efficiency and success, to computationally study the potential of repositioning drugs to identify alternative drug indications. This study reviews recent advancements in the field of computational drug repositioning. First, we highlight different drug repositioning strategies and provide an overview of frequently used resources. Second, we summarize computational approaches that are extensively used in drug repositioning studies. Third, we present different computing and experimental models to validate computational methods. Fourth, we address prospective opportunities, including a few target areas. Finally, we discuss challenges and limitations encountered in computational drug repositioning and conclude with an outline of further research directions.
Collapse
Affiliation(s)
- Tamer N Jarada
- Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
| | - Jon G Rokne
- Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
| | - Reda Alhajj
- Department of Computer Science, University of Calgary, Calgary, Alberta, Canada.
- Department of Computer Engineering, Istanbul Medipol University, Istanbul, Turkey.
| |
Collapse
|
13
|
Li X, Rousseau JF, Ding Y, Song M, Lu W. Understanding Drug Repurposing From the Perspective of Biomedical Entities and Their Evolution: Bibliographic Research Using Aspirin. JMIR Med Inform 2020; 8:e16739. [PMID: 32543442 PMCID: PMC7327595 DOI: 10.2196/16739] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 01/08/2020] [Accepted: 03/31/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Drug development is still a costly and time-consuming process with a low rate of success. Drug repurposing (DR) has attracted significant attention because of its significant advantages over traditional approaches in terms of development time, cost, and safety. Entitymetrics, defined as bibliometric indicators based on biomedical entities (eg, diseases, drugs, and genes) studied in the biomedical literature, make it possible for researchers to measure knowledge evolution and the transfer of drug research. OBJECTIVE The purpose of this study was to understand DR from the perspective of biomedical entities (diseases, drugs, and genes) and their evolution. METHODS In the work reported in this paper, we extended the bibliometric indicators of biomedical entities mentioned in PubMed to detect potential patterns of biomedical entities in various phases of drug research and investigate the factors driving DR. We used aspirin (acetylsalicylic acid) as the subject of the study since it can be repurposed for many applications. We propose 4 easy, transparent measures based on entitymetrics to investigate DR for aspirin: Popularity Index (P1), Promising Index (P2), Prestige Index (P3), and Collaboration Index (CI). RESULTS We found that the maxima of P1, P3, and CI are closely associated with the different repurposing phases of aspirin. These metrics enabled us to observe the way in which biomedical entities interacted with the drug during the various phases of DR and to analyze the potential driving factors for DR at the entity level. P1 and CI were indicative of the dynamic trends of a specific biomedical entity over a long time period, while P2 was more sensitive to immediate changes. P3 reflected the early signs of the practical value of biomedical entities and could be valuable for tracking the research frontiers of a drug. CONCLUSIONS In-depth studies of side effects and mechanisms, fierce market competition, and advanced life science technologies are driving factors for DR. This study showcases the way in which researchers can examine the evolution of DR using entitymetrics, an approach that can be valuable for enhancing decision making in the field of drug discovery and development.
Collapse
Affiliation(s)
- Xin Li
- Information Retrieval and Knowledge Mining Laboratory, School of Information Management, Wuhan University, Wuhan, China.,School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, United States
| | - Justin F Rousseau
- Department of Population Health and Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, United States
| | - Ying Ding
- School of Information, Dell Medical School, The University of Texas Austin, Austin, TX, United States
| | - Min Song
- Department of Library and Information Science, Yonsei University, Seoul, Republic of Korea
| | - Wei Lu
- Information Retrieval and Knowledge Mining Laboratory, School of Information Management, Wuhan University, Wuhan, China
| |
Collapse
|
14
|
Zhao T, Hu Y, Valsdottir LR, Zang T, Peng J. Identifying drug-target interactions based on graph convolutional network and deep neural network. Brief Bioinform 2020; 22:2141-2150. [PMID: 32367110 DOI: 10.1093/bib/bbaa044] [Citation(s) in RCA: 117] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 03/05/2020] [Accepted: 03/06/2020] [Indexed: 12/21/2022] Open
Abstract
Identification of new drug-target interactions (DTIs) is an important but a time-consuming and costly step in drug discovery. In recent years, to mitigate these drawbacks, researchers have sought to identify DTIs using computational approaches. However, most existing methods construct drug networks and target networks separately, and then predict novel DTIs based on known associations between the drugs and targets without accounting for associations between drug-protein pairs (DPPs). To incorporate the associations between DPPs into DTI modeling, we built a DPP network based on multiple drugs and proteins in which DPPs are the nodes and the associations between DPPs are the edges of the network. We then propose a novel learning-based framework, 'graph convolutional network (GCN)-DTI', for DTI identification. The model first uses a graph convolutional network to learn the features for each DPP. Second, using the feature representation as an input, it uses a deep neural network to predict the final label. The results of our analysis show that the proposed framework outperforms some state-of-the-art approaches by a large margin.
Collapse
Affiliation(s)
- Tianyi Zhao
- Department of Computer Science at Harbin Institute of Technology. He currently works as a bioinformatician in Beth Israel Deaconess Medical Center
| | - Yang Hu
- Department of Life Science at Harbin Institute of Technology. His expertise is bioinformatics
| | - Linda R Valsdottir
- MS in Biology and works as a scientific writer at the Smith Center for Outcomes Research in Cardiology at Beth Israel Deaconess Medical Center in Boston, MA. Her work is focused on helping researchers communicate their findings in an effort to translate novel analytical approaches and clinical expertise into improved outcomes for patients
| | - Tianyi Zang
- School of Computer Science and Technology at Harbin Institute of Technology (HIT), China. Before joining HIT in 2009, he was a research fellow at the Department of Computer Science at University of Oxford, UK. His current research is concerned with biomedical bigdata computing and algorithms, deep-learning algorithms for network data, intelligent recommendation algorithms, and modeling and analysis methods for complex systems
| | - Jiajie Peng
- School of Computer Science at Northwestern Polytechnical University. His expertise is computational biology and machine learning. Availability and implementation: https://github.com/zty2009/GCN-DNN/
| |
Collapse
|
15
|
Southan C. Opening up connectivity between documents, structures and bioactivity. Beilstein J Org Chem 2020; 16:596-606. [PMID: 32280387 PMCID: PMC7136548 DOI: 10.3762/bjoc.16.54] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Bioscientists reading papers or patents strive to discern the key relationships reported within a document "D" where a bioactivity "A" with a quantitative result "R" (e.g., an IC50) is reported for chemical structure "C" that modulates (e.g., inhibits) a protein target "P". A useful shorthand for this connectivity thus becomes DARCP. The problem at the core of this article is that the community has spent millions effectively burying these relationships in PDFs over many decades but must now spend millions more trying to get them back out. The key imperative for this is to increase the flow into structured open databases. The positive impacts will include expanded data mining opportunities for drug discovery and chemical biology. Over the last decade commercial sources have manually extracted DARCP from ≈300,000 documents encompassing ≈7 million compounds interacting with ≈10,000 targets. Over a similar time, the Guide to Pharmacology, BindingDB and ChEMBL have carried out analogues DARCP extractions. Although their expert-curated numbers are lower (i.e., ≈2 million compounds against ≈3700 human proteins), these open sources have the great advantage of being merged within PubChem. Parallel efforts have focused on the extraction of document-to-compound (D-C-only) connectivity. In the absence of molecular mechanism of action (mmoa) annotation, this is of less value but can be automatically extracted. This has been significantly accomplished for patents, (e.g., by IBM, SureChEMBL and WIPO) for over 30 million compounds in PubChem. These have recently been joined by 1.4 million D-C submissions from three major chemistry publishers. In addition, both the European and US PubMed Central portals now add chemistry look-ups from abstracts and full-text papers. However, the fully automated extraction of DARCLP has not yet been achieved. This stands in contrast to the ability of biocurators to discern these relationships in minutes. Unfortunately, no journals have yet instigated a flow of author-specified DARCP directly into open databases. Progress may come from trends such as open science, open access (OA), findable, accessible, interoperable and reusable (FAIR), resource description framework (RDF) and WikiData. However, we will need to await the technical applicability in respect to DARCP capture to see if this opens up connectivity.
Collapse
Affiliation(s)
- Christopher Southan
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, EH8 9XD, UK.,TW2Informatics Ltd, Västra Frölunda, Gothenburg, 42166, Sweden
| |
Collapse
|
16
|
Bizon C, Cox S, Balhoff J, Kebede Y, Wang P, Morton K, Fecho K, Tropsha A. ROBOKOP KG and KGB: Integrated Knowledge Graphs from Federated Sources. J Chem Inf Model 2019; 59:4968-4973. [DOI: 10.1021/acs.jcim.9b00683] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Chris Bizon
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27517, United States
| | - Steven Cox
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27517, United States
| | - James Balhoff
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27517, United States
| | - Yaphet Kebede
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27517, United States
| | - Patrick Wang
- CoVar Applied Technologies, Durham, North Carolina 27701, United States
| | - Kenneth Morton
- CoVar Applied Technologies, Durham, North Carolina 27701, United States
| | - Karamarie Fecho
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27517, United States
| | - Alexander Tropsha
- School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
17
|
Kumar R, Harilal S, Gupta SV, Jose J, Thomas Parambi DG, Uddin MS, Shah MA, Mathew B. Exploring the new horizons of drug repurposing: A vital tool for turning hard work into smart work. Eur J Med Chem 2019; 182:111602. [PMID: 31421629 PMCID: PMC7127402 DOI: 10.1016/j.ejmech.2019.111602] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/07/2019] [Accepted: 08/07/2019] [Indexed: 02/07/2023]
Abstract
Drug discovery and development are long and financially taxing processes. On an average it takes 12-15 years and costs 1.2 billion USD for successful drug discovery and approval for clinical use. Many lead molecules are not developed further and their potential is not tapped to the fullest due to lack of resources or time constraints. In order for a drug to be approved by FDA for clinical use, it must have excellent therapeutic potential in the desired area of target with minimal toxicities as supported by both pre-clinical and clinical studies. The targeted clinical evaluations fail to explore other potential therapeutic applications of the candidate drug. Drug repurposing or repositioning is a fast and relatively cheap alternative to the lengthy and expensive de novo drug discovery and development. Drug repositioning utilizes the already available clinical trials data for toxicity and adverse effects, at the same time explores the drug's therapeutic potential for a different disease. This review addresses recent developments and future scope of drug repositioning strategy.
Collapse
Affiliation(s)
- Rajesh Kumar
- Department of Pharmacy, Kerala University of Health Sciences, Thrissur, Kerala, India
| | - Seetha Harilal
- Department of Pharmacy, Kerala University of Health Sciences, Thrissur, Kerala, India
| | - Sheeba Varghese Gupta
- Department of Pharmaceutical Sciences, College of Pharmacy, University of South Florida, Tampa, FL, 33612, USA
| | - Jobin Jose
- Department of Pharmaceutics, NGSM Institute of Pharmaceutical Science, NITTE Deemed to be University, Manglore, 575018, India
| | - Della Grace Thomas Parambi
- Department of Pharmaceutical Chemistry, College of Pharmacy, Jouf University, Sakaka, Al Jouf, 2014, Saudi Arabia
| | - Md Sahab Uddin
- Department of Pharmacy, Southeast University, Dhaka, Bangladesh; Pharmakon Neuroscience Research Network, Dhaka, Bangladesh
| | - Muhammad Ajmal Shah
- Department of Pharmacogonosy, Faculty of Pharmaceutical Sciences, Government College University, Faisalabad, Pakistan
| | - Bijo Mathew
- Division of Drug Design and Medicinal Chemistry Research Lab, Department of Pharmaceutical Chemistry, Ahalia School of Pharmacy, Palakkad, 678557, Kerala, India.
| |
Collapse
|
18
|
|
19
|
Liang X, Li D, Song M, Madden A, Ding Y, Bu Y. Predicting biomedical relationships using the knowledge and graph embedding cascade model. PLoS One 2019; 14:e0218264. [PMID: 31194807 PMCID: PMC6565371 DOI: 10.1371/journal.pone.0218264] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Accepted: 05/29/2019] [Indexed: 02/06/2023] Open
Abstract
Advances in machine learning and deep learning methods, together with the increasing availability of large-scale pharmacological, genomic, and chemical datasets, have created opportunities for identifying potentially useful relationships within biochemical networks. Knowledge embedding models have been found to have value in detecting knowledge-based correlations among entities, but little effort has been made to apply them to networks of biochemical entities. This is because such networks tend to be unbalanced and sparse, and knowledge embedding models do not work well on them. However, to some extent, the shortcomings of knowledge embedding models can be compensated for if they are used in association with graph embedding. In this paper, we combine knowledge embedding and graph embedding to represent biochemical entities and their relations as dense and low-dimensional vectors. We build a cascade learning framework which incorporates semantic features from the knowledge embedding model, and graph features from the graph embedding model, to score the probability of linking. The proposed method performs noticeably better than the models with which it is compared. It predicted links and entities with an accuracy of 93%, and its average hits@10 score has an average of 8.6% absolute improvement compared with original knowledge embedding model, 1.1% to 9.7% absolute improvement compared with other knowledge and graph embedding algorithm. In addition, we designed a meta-path algorithm to detect path relations in the biomedical network. Case studies further verify the value of the proposed model in finding potential relationships between diseases, drugs, genes, treatments, etc. Amongst the findings of the proposed model are the suggestion that VDR (vitamin D receptor) may be linked to prostate cancer. This is backed by evidence from medical databases and published research, supporting the suggestion that our proposed model could be of value to biomedical researchers.
Collapse
Affiliation(s)
- Xiaomin Liang
- School of Information Management, Sun Yat-Sen Uniersity, Guangzhou, Guangdong, China
| | - Daifeng Li
- School of Information Management, Sun Yat-Sen Uniersity, Guangzhou, Guangdong, China
| | - Min Song
- Department of Library and Information Science, Yonsei University, Seoul, Korea
| | - Andrew Madden
- School of Information Management, Sun Yat-Sen Uniersity, Guangzhou, Guangdong, China
| | - Ying Ding
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America
- School of Information Management, Wuhan University, Wuhan, Hubei, China
| | - Yi Bu
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America
| |
Collapse
|
20
|
Gao Z, Fu G, Ouyang C, Tsutsui S, Liu X, Yang J, Gessner C, Foote B, Wild D, Ding Y, Yu Q. edge2vec: Representation learning using edge semantics for biomedical knowledge discovery. BMC Bioinformatics 2019; 20:306. [PMID: 31238875 PMCID: PMC6593489 DOI: 10.1186/s12859-019-2914-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 05/24/2019] [Indexed: 11/23/2022] Open
Abstract
Background Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. Results In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. Conclusions We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.
Collapse
Affiliation(s)
- Zheng Gao
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Gang Fu
- Microsoft Corporation, Seattle, Washington, USA
| | | | - Satoshi Tsutsui
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Xiaozhong Liu
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | - Jeremy Yang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.,Microsoft Corporation, Seattle, Washington, USA.,School of Medicine, University of New Mexico, Albuquerque, NM, USA
| | - Christopher Gessner
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA
| | | | - David Wild
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.,Data2Discovery, Inc., Bloomington, IN, USA
| | - Ying Ding
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.,Data2Discovery, Inc., Bloomington, IN, USA
| | - Qi Yu
- School of Management, Shanxi Medical University, Taiyuan, Shanxi, China.
| |
Collapse
|
21
|
Auto-Generated Physiological Chain Data for an Ontological Framework for Pharmacology and Mechanism of Action to Determine Suspected Drugs in Cases of Dysuria. Drug Saf 2019; 42:1055-1069. [PMID: 31119651 DOI: 10.1007/s40264-019-00833-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
INTRODUCTION Patients often take several different medications for multiple conditions concurrently. Therefore, when adverse drug events (ADEs) occur, it is necessary to consider the mechanisms responsible. Few approaches consider the mechanisms of ADEs, such as changes in physiological states. We proposed that the ontological framework for pharmacology and mechanism of action (pharmacodynamics) we developed could be used for this approach. However, the existing knowledge base contains little data on physiological chains (PCs). OBJECTIVE We aimed to investigate a method for automatically generating missing PC from the viewpoint of anatomical structures. This study was conducted to determine dysuria-related adverse events more likely to occur during multidrug administration. METHODS We adopted a systematic approach to determine drugs suspected to cause adverse events and incorporated existing data and data generated in our newly developed method into our ontological framework. The performance of automated data generation was evaluated using this newly developed system. Suspected drugs determined by the system were compared with those derived from adverse events databases. RESULTS Of the 242 drugs involving suspected drug-induced urinary retention or dysuria, 26 suspected drugs were determined. Of these, five were drugs with side effects not listed in drug package inserts. The system derived potential mechanisms of action, PCs, and suspected drugs. CONCLUSION Our method is novel in that it generates PC data from anatomical structural properties and could serve as a knowledge base for determining suspected drugs by potential mechanisms of action.
Collapse
|
22
|
Oulas A, Minadakis G, Zachariou M, Sokratous K, Bourdakou MM, Spyrou GM. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches. Brief Bioinform 2019; 20:806-824. [PMID: 29186305 PMCID: PMC6585387 DOI: 10.1093/bib/bbx151] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/17/2017] [Indexed: 02/01/2023] Open
Abstract
Systems Bioinformatics is a relatively new approach, which lies in the intersection of systems biology and classical bioinformatics. It focuses on integrating information across different levels using a bottom-up approach as in systems biology with a data-driven top-down approach as in bioinformatics. The advent of omics technologies has provided the stepping-stone for the emergence of Systems Bioinformatics. These technologies provide a spectrum of information ranging from genomics, transcriptomics and proteomics to epigenomics, pharmacogenomics, metagenomics and metabolomics. Systems Bioinformatics is the framework in which systems approaches are applied to such data, setting the level of resolution as well as the boundary of the system of interest and studying the emerging properties of the system as a whole rather than the sum of the properties derived from the system's individual components. A key approach in Systems Bioinformatics is the construction of multiple networks representing each level of the omics spectrum and their integration in a layered network that exchanges information within and between layers. Here, we provide evidence on how Systems Bioinformatics enhances computational therapeutics and diagnostics, hence paving the way to precision medicine. The aim of this review is to familiarize the reader with the emerging field of Systems Bioinformatics and to provide a comprehensive overview of its current state-of-the-art methods and technologies. Moreover, we provide examples of success stories and case studies that utilize such methods and tools to significantly advance research in the fields of systems biology and systems medicine.
Collapse
Affiliation(s)
- Anastasis Oulas
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - George Minadakis
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Margarita Zachariou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Kleitos Sokratous
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Marilena M Bourdakou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - George M Spyrou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| |
Collapse
|
23
|
Ciallella HL, Zhu H. Advancing Computational Toxicology in the Big Data Era by Artificial Intelligence: Data-Driven and Mechanism-Driven Modeling for Chemical Toxicity. Chem Res Toxicol 2019; 32:536-547. [PMID: 30907586 DOI: 10.1021/acs.chemrestox.8b00393] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
In 2016, the Frank R. Lautenberg Chemical Safety for the 21st Century Act became the first US legislation to advance chemical safety evaluations by utilizing novel testing approaches that reduce the testing of vertebrate animals. Central to this mission is the advancement of computational toxicology and artificial intelligence approaches to implementing innovative testing methods. In the current big data era, the terms volume (amount of data), velocity (growth of data), and variety (the diversity of sources) have been used to characterize the currently available chemical, in vitro, and in vivo data for toxicity modeling purposes. Furthermore, as suggested by various scientists, the variability (internal consistency or lack thereof) of publicly available data pools, such as PubChem, also presents significant computational challenges. The development of novel artificial intelligence approaches based on public massive toxicity data is urgently needed to generate new predictive models for chemical toxicity evaluations and make the developed models applicable as alternatives for evaluating untested compounds. In this procedure, traditional approaches (e.g., QSAR) purely based on chemical structures have been replaced by newly designed data-driven and mechanism-driven modeling. The resulting models realize the concept of adverse outcome pathway (AOP), which can not only directly evaluate toxicity potentials of new compounds, but also illustrate relevant toxicity mechanisms. The recent advancement of computational toxicology in the big data era has paved the road to future toxicity testing, which will significantly impact on the public health.
Collapse
|
24
|
Kanza S, Frey JG. A new wave of innovation in Semantic web tools for drug discovery. Expert Opin Drug Discov 2019; 14:433-444. [DOI: 10.1080/17460441.2019.1586880] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Samantha Kanza
- Department of Chemistry, Highfield Campus, University of Southampton, Southampton, UK
| | - Jeremy Graham Frey
- Department of Chemistry, Highfield Campus, University of Southampton, Southampton, UK
| |
Collapse
|
25
|
Abstract
Recent advances in technology have led to the exponential growth of scientific literature in biomedical sciences. This rapid increase in information has surpassed the threshold for manual curation efforts, necessitating the use of text mining approaches in the field of life sciences. One such application of text mining is in fostering in silico drug discovery such as drug target screening, pharmacogenomics, adverse drug event detection, etc. This chapter serves as an introduction to the applications of various text mining approaches in drug discovery. It is divided into two parts with the first half as an overview of text mining in the biosciences. The second half of the chapter reviews strategies and methods for four unique applications of text mining in drug discovery.
Collapse
Affiliation(s)
- Si Zheng
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Shazia Dharssi
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Meng Wu
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jiao Li
- Institute of Medical Information and Library, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA.
| |
Collapse
|
26
|
Abstract
Drugs modulate disease states through their actions on targets in the body. Determining these targets aids the focused development of new treatments, and helps to better characterize those already employed. One means of accomplishing this is through the deployment of in silico methodologies, harnessing computational analytical and predictive power to produce educated hypotheses for experimental verification. Here, we provide an overview of the current state of the art, describe some of the well-established methods in detail, and reflect on how they, and emerging technologies promoting the incorporation of complex and heterogeneous data-sets, can be employed to improve our understanding of (poly)pharmacology.
Collapse
Affiliation(s)
- Ryan Byrne
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland
| | - Gisbert Schneider
- Department of Chemistry and Applied Biosciences, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland.
| |
Collapse
|
27
|
Luechtefeld T, Hartung T. Computational approaches to chemical hazard assessment. ALTEX-ALTERNATIVES TO ANIMAL EXPERIMENTATION 2018; 34:459-478. [PMID: 29101769 PMCID: PMC5848496 DOI: 10.14573/altex.1710141] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Indexed: 01/10/2023]
Abstract
Computational prediction of toxicity has reached new heights as a result of decades of growth in the magnitude and diversity of biological data. Public packages for statistics and machine learning make model creation faster. New theory in machine learning and cheminformatics enables integration of chemical structure, toxicogenomics, simulated and physical data in the prediction of chemical health hazards, and other toxicological information. Our earlier publications have characterized a toxicological dataset of unprecedented scale resulting from the European REACH legislation (Registration Evaluation Authorisation and Restriction of Chemicals). These publications dove into potential use cases for regulatory data and some models for exploiting this data. This article analyzes the options for the identification and categorization of chemicals, moves on to the derivation of descriptive features for chemicals, discusses different kinds of targets modeled in computational toxicology, and ends with a high-level perspective of the algorithms used to create computational toxicology models.
Collapse
Affiliation(s)
- Thomas Luechtefeld
- Johns Hopkins Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA
| | - Thomas Hartung
- Johns Hopkins Center for Alternatives to Animal Testing (CAAT), Baltimore, MD, USA.,CAAT-Europe, University of Konstanz, Konstanz, Germany
| |
Collapse
|
28
|
Zhou Y, Huang J, Li H, Sun H, Peng Y, Xu Y. A semantic-rich similarity measure in heterogeneous information networks. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.05.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
29
|
Hu W, Qiu H, Huang J, Dumontier M. BioSearch: a semantic search engine for Bio2RDF. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2017:4079799. [PMID: 29220451 PMCID: PMC5569678 DOI: 10.1093/database/bax059] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2016] [Accepted: 07/10/2017] [Indexed: 12/14/2022]
Abstract
Biomedical data are growing at an incredible pace and require substantial expertise to organize data in a manner that makes them easily findable, accessible, interoperable and reusable. Massive effort has been devoted to using Semantic Web standards and technologies to create a network of Linked Data for the life sciences, among others. However, while these data are accessible through programmatic means, effective user interfaces for non-experts to SPARQL endpoints are few and far between. Contributing to user frustrations is that data are not necessarily described using common vocabularies, thereby making it difficult to aggregate results, especially when distributed across multiple SPARQL endpoints. We propose BioSearch — a semantic search engine that uses ontologies to enhance federated query construction and organize search results. BioSearch also features a simplified query interface that allows users to optionally filter their keywords according to classes, properties and datasets. User evaluation demonstrated that BioSearch is more effective and usable than two state of the art search and browsing solutions. Database URL:http://ws.nju.edu.cn/biosearch/
Collapse
Affiliation(s)
- Wei Hu
- State Key Laboratory for Novel Software Technology, Nanjing University, China.,Institute of Data Science, Maastricht University, The Netherlands
| | - Honglei Qiu
- State Key Laboratory for Novel Software Technology, Nanjing University, China
| | - Jiacheng Huang
- State Key Laboratory for Novel Software Technology, Nanjing University, China
| | - Michel Dumontier
- Institute of Data Science, Maastricht University, The Netherlands
| |
Collapse
|
30
|
Xue H, Li J, Xie H, Wang Y. Review of Drug Repositioning Approaches and Resources. Int J Biol Sci 2018; 14:1232-1244. [PMID: 30123072 PMCID: PMC6097480 DOI: 10.7150/ijbs.24612] [Citation(s) in RCA: 314] [Impact Index Per Article: 52.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2017] [Accepted: 06/12/2018] [Indexed: 12/23/2022] Open
Abstract
Drug discovery is a time-consuming, high-investment, and high-risk process in traditional drug development. Drug repositioning has become a popular strategy in recent years. Different from traditional drug development strategies, the strategy is efficient, economical and riskless. There are usually three kinds of approaches: computational approaches, biological experimental approaches, and mixed approaches, all of which are widely used in drug repositioning. In this paper, we reviewed computational approaches and highlighted their characteristics to provide references for researchers to develop more powerful approaches. At the same time, the important findings obtained using these approaches are listed. Furthermore, we summarized 76 important resources about drug repositioning. Finally, challenges and opportunities in drug repositioning are discussed from multiple perspectives, including technology, commercial models, patents and investment.
Collapse
Affiliation(s)
- Hanqing Xue
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Haozhe Xie
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, 150001, Harbin, China
| |
Collapse
|
31
|
Chen X, Gururaj AE, Ozyurt B, Liu R, Soysal E, Cohen T, Tiryaki F, Li Y, Zong N, Jiang M, Rogith D, Salimi M, Kim HE, Rocca-Serra P, Gonzalez-Beltran A, Farcas C, Johnson T, Margolis R, Alter G, Sansone SA, Fore IM, Ohno-Machado L, Grethe JS, Xu H. DataMed - an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 2018; 25:300-308. [PMID: 29346583 PMCID: PMC7378878 DOI: 10.1093/jamia/ocx121] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 09/20/2017] [Accepted: 09/28/2017] [Indexed: 12/17/2022] Open
Abstract
Objective Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. Materials and Methods DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Results and Conclusion Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community.
Collapse
Affiliation(s)
- Xiaoling Chen
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Anupama E Gururaj
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Ruiling Liu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ergin Soysal
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Trevor Cohen
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Firat Tiryaki
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yueling Li
- Center for Research in Biological Systems
| | - Nansu Zong
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Min Jiang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Deevakar Rogith
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Mandana Salimi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hyeon-Eui Kim
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | | | | | - Claudiu Farcas
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Todd Johnson
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ron Margolis
- National Institutes of Health, Bethesda, MD, USA
| | | | | | - Ian M Fore
- National Institutes of Health, Bethesda, MD, USA
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | | | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
32
|
Liu J, Ning X. Differential Compound Prioritization via Bidirectional Selectivity Push with Power. J Chem Inf Model 2017; 57:2958-2975. [PMID: 29178784 DOI: 10.1021/acs.jcim.7b00552] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Junfeng Liu
- Indiana University - Purdue University Indianapolis, 723 West Michigan Street, SL 280, Indianapolis, Indiana 46202, United States
| | - Xia Ning
- Indiana University - Purdue University Indianapolis, 723 West Michigan Street, SL 280, Indianapolis, Indiana 46202, United States
- Center
for Computational Biology and Bioinformatics, Indiana University School of Medicine, 410 West 10th Street, HITS 5000, Indianapolis, Indiana 46202, United States
| |
Collapse
|
33
|
Sam E, Athri P. Web-based drug repurposing tools: a survey. Brief Bioinform 2017; 20:299-316. [DOI: 10.1093/bib/bbx125] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Indexed: 12/15/2022] Open
Affiliation(s)
- Elizabeth Sam
- Department of Computer Science & Engineering Amrita, University Bengaluru, India
| | - Prashanth Athri
- Department of Computer Science & Engineering Amrita, University Bengaluru, India
| |
Collapse
|
34
|
Bolgár B, Antal P. VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization. BMC Bioinformatics 2017; 18:440. [PMID: 28978313 PMCID: PMC5628496 DOI: 10.1186/s12859-017-1845-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 09/21/2017] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance. METHOD We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions. RESULTS VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of "small sample size" regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time. CONCLUSION In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions.
Collapse
Affiliation(s)
- Bence Bolgár
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2., Budapest, 1117 Hungary
| | - Péter Antal
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2., Budapest, 1117 Hungary
| |
Collapse
|
35
|
Djokic-Petrovic M, Cvjetkovic V, Yang J, Zivanovic M, Wild DJ. PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets. J Biomed Semantics 2017; 8:42. [PMID: 28931422 PMCID: PMC5607505 DOI: 10.1186/s13326-017-0151-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 09/12/2017] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND There are a huge variety of data sources relevant to chemical, biological and pharmacological research, but these data sources are highly siloed and cannot be queried together in a straightforward way. Semantic technologies offer the ability to create links and mappings across datasets and manage them as a single, linked network so that searching can be carried out across datasets, independently of the source. We have developed an application called PIBAS FedSPARQL that uses semantic technologies to allow researchers to carry out such searching across a vast array of data sources. RESULTS PIBAS FedSPARQL is a web-based query builder and result set visualizer of bioinformatics data. As an advanced feature, our system can detect similar data items identified by different Uniform Resource Identifiers (URIs), using a text-mining algorithm based on the processing of named entities to be used in Vector Space Model and Cosine Similarity Measures. According to our knowledge, PIBAS FedSPARQL was unique among the systems that we found in that it allows detecting of similar data items. As a query builder, our system allows researchers to intuitively construct and run Federated SPARQL queries across multiple data sources, including global initiatives, such as Bio2RDF, Chem2Bio2RDF, EMBL-EBI, and one local initiative called CPCTAS, as well as additional user-specified data source. From the input topic, subtopic, template and keyword, a corresponding initial Federated SPARQL query is created and executed. Based on the data obtained, end users have the ability to choose the most appropriate data sources in their area of interest and exploit their Resource Description Framework (RDF) structure, which allows users to select certain properties of data to enhance query results. CONCLUSIONS The developed system is flexible and allows intuitive creation and execution of queries for an extensive range of bioinformatics topics. Also, the novel "similar data items detection" algorithm can be particularly useful for suggesting new data sources and cost optimization for new experiments. PIBAS FedSPARQL can be expanded with new topics, subtopics and templates on demand, rendering information retrieval more robust.
Collapse
Affiliation(s)
- Marija Djokic-Petrovic
- Virtual World Services GmbH, Asperner Heldenplatz 6, 1220 Wien, Austria
- Department of Mathematics and Informatics, Faculty of Science, University of Kragujevac, Radoja Domanovica 12, Kragujevac, 34000 Serbia
| | - Vladimir Cvjetkovic
- Department of Mathematics and Informatics, Faculty of Science, University of Kragujevac, Radoja Domanovica 12, Kragujevac, 34000 Serbia
| | - Jeremy Yang
- School of Informatics and Computing, Indiana University, 901 E 10th St, Bloomington, Indiana, 47408 USA
- Translational Informatics Division, School of Medicine, University of New Mexico, Albuquerque, NM 87131 USA
| | - Marko Zivanovic
- Department of Biology and Ecology, Faculty of Science, University of Kragujevac, Radoja Domanovica 12, Kragujevac, 34 000 Serbia
| | - David J. Wild
- School of Informatics and Computing, Indiana University, 901 E 10th St, Bloomington, Indiana, 47408 USA
| |
Collapse
|
36
|
Van Den Driessche G, Fourches D. Adverse drug reactions triggered by the common HLA-B*57:01 variant: a molecular docking study. J Cheminform 2017; 9:13. [PMID: 28303164 PMCID: PMC5337232 DOI: 10.1186/s13321-017-0202-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 02/24/2017] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Human leukocyte antigen (HLA) surface proteins are directly involved in idiosyncratic adverse drug reactions. Herein, we present a structure-based analysis of the common HLA-B*57:01 variant known to be responsible for several HLA-linked adverse effects such as the abacavir hypersensitivity syndrome. METHODS First, we analyzed three X-ray crystal structures involving the HLA-B*57:01 protein variant, the anti-HIV drug abacavir, and different co-binding peptides present in the antigen-binding cleft. We superimposed the three complexes and showed that abacavir had no significant conformational variation whatever the co-binding peptide. Second, we self-docked abacavir in the HLA-B*57:01 antigen binding cleft with and without peptide using Glide. Third, we docked a small test set of 13 drugs with known ADRs and suspected HLA associations. RESULTS In the presence of an endogenous co-binding peptide, we found a significant stabilization (~2 kcal/mol) of the docking scores and identified several modified abacavir-peptide interactions indicating that the peptide does play a role in stabilizing the HLA-abacavir complex. Next, our model was used to dock a test set of 13 drugs at HLA-B*57:01 and measured their predicted binding affinities. Drug-specific interactions were observed at the antigen-binding cleft and we were able to discriminate the compounds with known HLA-B*57:01 liability from inactives. CONCLUSIONS Overall, our study highlights the relevance of molecular docking for evaluating and analyzing complex HLA-drug interactions. This is particularly important for virtual drug screening over thousands of HLA variants as other experimental techniques (e.g., in vitro HTS) and computational approaches (e.g., molecular dynamics) are more time consuming and expensive to conduct. As the attention for drugs' HLA liability is on the rise, we believe this work participates in encouraging the use of molecular modeling for reliably studying and predicting HLA-drug interactions. Graphical abstract.
Collapse
Affiliation(s)
- George Van Den Driessche
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, NC USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, NC USA
| |
Collapse
|
37
|
Paul Rupa A, Singh S, Zhu Q. GT2RDF: Semantic Representation of Genetic Testing Data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:1060-1069. [PMID: 28269903 PMCID: PMC5333271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Accelerated by the Human Genome Project, genetic testing has become an increasingly integral component in diagnosis, treatment, management, and prevention of numerous diseases and conditions. More than 480 laboratories perform genetic tests for more than 4,600 rare and common medical conditions. These tests can effectively help health professionals to determine or predict the genetic conditions of their patients. However, physicians have not actively incorporated such innovative genetic technology into their clinical practices according to two national wide surveys commissioned by UnitedHealth Group. To fill the gap of insufficient use of a large number of genetic tests, we generated a single Resource Description Framework (RDF) resource, called GT2RDF (Genetic Testing data to RDF) by integrating information about disease, gene, phenotype, genetic test, and drug from multiple sources including Genetic Testing Registry (GTR), Online Mendelian Inheritance in Man (OMIM), MedGen, Human Phenotype Ontology (HPO), ClinVar, National Drug File Reference Terminology (NDF-RT). Meanwhile, we manually annotated and extracted information from 200 randomly selected GeneReviews chapters, and integrated into the GT2RDF. We performed two case studies to demonstrate the usability of the GT2RDF. GT2RDF will serve as a data foundation to support the design of a genetic testing recommendation system, called iGenetics, which will ultimately facilitate the pace of precision medicine by means of actively and effectively incorporating innovative genetic technology in clinical settings. Abbreviations: GT2RDF: Genetic Testing data to RDF; SWT: Semantic web technology; OWL: Ontology Web Language; RDF: Resource Description Framework; SPARQL: SPARQL Protocol and RDF Query Language; GTR: Genetic Testing Registry; OMIM: Online Mendelian Inheritance in Man; HPO: Human Phenotype Ontology; NDF-RT: National Drug File Reference Terminology; UMLS: Unified Medical Language System.
Collapse
Affiliation(s)
- Anamika Paul Rupa
- Department of Information Systems, University of Maryland Baltimore County, Baltimore, MD, US
| | - Sweta Singh
- Department of Information Systems, University of Maryland Baltimore County, Baltimore, MD, US
| | - Qian Zhu
- Department of Information Systems, University of Maryland Baltimore County, Baltimore, MD, US
| |
Collapse
|
38
|
Luo Y, Uzuner Ö, Szolovits P. Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations. Brief Bioinform 2017; 18:160-178. [PMID: 26851224 PMCID: PMC5221425 DOI: 10.1093/bib/bbw001] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 11/29/2015] [Indexed: 01/18/2023] Open
Abstract
Research on extracting biomedical relations has received growing attention recently, with numerous biological and clinical applications including those in pharmacogenomics, clinical trial screening and adverse drug reaction detection. The ability to accurately capture both semantic and syntactic structures in text expressing these relations becomes increasingly critical to enable deep understanding of scientific papers and clinical narratives. Shared task challenges have been organized by both bioinformatics and clinical informatics communities to assess and advance the state-of-the-art research. Significant progress has been made in algorithm development and resource construction. In particular, graph-based approaches bridge semantics and syntax, often achieving the best performance in shared tasks. However, a number of problems at the frontiers of biomedical relation extraction continue to pose interesting challenges and present opportunities for great improvement and fruitful research. In this article, we place biomedical relation extraction against the backdrop of its versatile applications, present a gentle introduction to its general pipeline and shared resources, review the current state-of-the-art in methodology advancement, discuss limitations and point out several promising future directions.
Collapse
Affiliation(s)
- Yuan Luo
- Department of Preventive Medicine, Northwestern University, 11th Floor, Arthur Rubloff Building, 750 N. Lake Shore Drive, Chicago, IL, USA
| | - Özlem Uzuner
- Department of Information Studies, State University of New York at Albany, New York, USA
| | - Peter Szolovits
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Massachusetts, USA
| |
Collapse
|
39
|
Stuermer M, Abu-Tayeh G, Myrach T. Digital sustainability: basic conditions for sustainable digital artifacts and their ecosystems. SUSTAINABILITY SCIENCE 2016; 12:247-262. [PMID: 30174752 PMCID: PMC6106115 DOI: 10.1007/s11625-016-0412-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 11/05/2016] [Indexed: 06/01/2023]
Abstract
The modern age has heralded a shift from the industrial society, in which natural resources are crucial input factors for the economy, towards a knowledge society. To date, sustainability literature has treated knowledge-and in particular digital artifacts-mainly as a means to the end of achieving sustainable development. In this conceptual paper, we argue that digital artifacts themselves ought also to be considered as resources, which also need to be sustainable. While over-consumption is a problem facing natural resources, with sustainable digital artifacts, underproduction, and underuse are the biggest challenges. In our view, the sustainability of digital artifacts improves their potential impact on sustainable development. A theoretical foundation for digital artifacts and their ecosystem allows us to present the relevant research on digital information, knowledge management, digital goods, and innovation literature. Based on these insights, we propose ten basic conditions for sustainable digital artifacts and their ecosystem to ensure that they provide the greatest possible benefit for sustainable development. We then apply those characteristics to four exemplary cases: Linux kernel development, Bitcoin cryptocurrency, the Wikipedia project, and the Linking Open Drug Data repositories. The paper concludes with a research agenda identifying topics for sustainability scholars and information systems academics, as well as practitioners. A number of suggestions for future studies on digital sustainability are also put forward.
Collapse
|
40
|
Shen F, Lee Y. Knowledge Discovery from Biomedical Ontologies in Cross Domains. PLoS One 2016; 11:e0160005. [PMID: 27548262 PMCID: PMC4993478 DOI: 10.1371/journal.pone.0160005] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 07/12/2016] [Indexed: 01/19/2023] Open
Abstract
In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies.
Collapse
Affiliation(s)
- Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Yugyung Lee
- School of Computing and Engineering, University of Missouri - Kansas City, Kansas City, Missouri, United States of America
- * E-mail:
| |
Collapse
|
41
|
Iyappan A, Kawalia SB, Raschka T, Hofmann-Apitius M, Senger P. NeuroRDF: semantic integration of highly curated data to prioritize biomarker candidates in Alzheimer's disease. J Biomed Semantics 2016; 7:45. [PMID: 27392431 PMCID: PMC4939021 DOI: 10.1186/s13326-016-0079-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 05/23/2016] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Neurodegenerative diseases are incurable and debilitating indications with huge social and economic impact, where much is still to be learnt about the underlying molecular events. Mechanistic disease models could offer a knowledge framework to help decipher the complex interactions that occur at molecular and cellular levels. This motivates the need for the development of an approach integrating highly curated and heterogeneous data into a disease model of different regulatory data layers. Although several disease models exist, they often do not consider the quality of underlying data. Moreover, even with the current advancements in semantic web technology, we still do not have cure for complex diseases like Alzheimer's disease. One of the key reasons accountable for this could be the increasing gap between generated data and the derived knowledge. RESULTS In this paper, we describe an approach, called as NeuroRDF, to develop an integrative framework for modeling curated knowledge in the area of complex neurodegenerative diseases. The core of this strategy lies in the usage of well curated and context specific data for integration into one single semantic web-based framework, RDF. This increases the probability of the derived knowledge to be novel and reliable in a specific disease context. This infrastructure integrates highly curated data from databases (Bind, IntAct, etc.), literature (PubMed), and gene expression resources (such as GEO and ArrayExpress). We illustrate the effectiveness of our approach by asking real-world biomedical questions that link these resources to prioritize the plausible biomarker candidates. Among the 13 prioritized candidate genes, we identified MIF to be a potential emerging candidate due to its role as a pro-inflammatory cytokine. We additionally report on the effort and challenges faced during generation of such an indication-specific knowledge base comprising of curated and quality-controlled data. CONCLUSION Although many alternative approaches have been proposed and practiced for modeling diseases, the semantic web technology is a flexible and well established solution for harmonized aggregation. The benefit of this work, to use high quality and context specific data, becomes apparent in speculating previously unattended biomarker candidates around a well-known mechanism, further leveraged for experimental investigations.
Collapse
Affiliation(s)
- Anandhi Iyappan
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113, Bonn, Germany
| | - Shweta Bagewadi Kawalia
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany.
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113, Bonn, Germany.
| | - Tamara Raschka
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany
- University of Applied Sciences Koblenz, RheinAhrCampus, Joseph-Rovan-Allee 2, 53424, Remagen, Germany
| | - Martin Hofmann-Apitius
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany
- Bonn-Aachen International Center for Information Technology, Rheinische Friedrich-Wilhelms-Universität Bonn, 53113, Bonn, Germany
| | - Philipp Senger
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754, Sankt Augustin, Germany
| |
Collapse
|
42
|
Kaalia R, Ghosh I. Semantics based approach for analyzing disease-target associations. J Biomed Inform 2016; 62:125-35. [PMID: 27349858 DOI: 10.1016/j.jbi.2016.06.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 06/23/2016] [Accepted: 06/24/2016] [Indexed: 12/16/2022]
Abstract
BACKGROUND A complex disease is caused by heterogeneous biological interactions between genes and their products along with the influence of environmental factors. There have been many attempts for understanding the cause of these diseases using experimental, statistical and computational methods. In the present work the objective is to address the challenge of representation and integration of information from heterogeneous biomedical aspects of a complex disease using semantics based approach. METHODS Semantic web technology is used to design Disease Association Ontology (DAO-db) for representation and integration of disease associated information with diabetes as the case study. The functional associations of disease genes are integrated using RDF graphs of DAO-db. Three semantic web based scoring algorithms (PageRank, HITS (Hyperlink Induced Topic Search) and HITS with semantic weights) are used to score the gene nodes on the basis of their functional interactions in the graph. RESULTS Disease Association Ontology for Diabetes (DAO-db) provides a standard ontology-driven platform for describing genes, proteins, pathways involved in diabetes and for integrating functional associations from various interaction levels (gene-disease, gene-pathway, gene-function, gene-cellular component and protein-protein interactions). An automatic instance loader module is also developed in present work that helps in adding instances to DAO-db on a large scale. CONCLUSIONS Our ontology provides a framework for querying and analyzing the disease associated information in the form of RDF graphs. The above developed methodology is used to predict novel potential targets involved in diabetes disease from the long list of loose (statistically associated) gene-disease associations.
Collapse
Affiliation(s)
- Rama Kaalia
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
| | - Indira Ghosh
- School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
| |
Collapse
|
43
|
Galgonek J, Hurt T, Michlíková V, Onderka P, Schwarz J, Vondrášek J. Advanced SPARQL querying in small molecule databases. J Cheminform 2016; 8:31. [PMID: 27275187 PMCID: PMC4893829 DOI: 10.1186/s13321-016-0144-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 05/25/2016] [Indexed: 11/14/2022] Open
Abstract
Background In recent years, the Resource Description Framework (RDF) and the SPARQL query language have become more widely used in the area of cheminformatics and bioinformatics databases. These technologies allow better interoperability of various data sources and powerful searching facilities. However, we identified several deficiencies that make usage of such RDF databases restrictive or challenging for common users. Results We extended a SPARQL engine to be able to use special procedures inside SPARQL queries. This allows the user to work with data that cannot be simply precomputed and thus cannot be directly stored in the database. We designed an algorithm that checks a query against data ontology to identify possible user errors. This greatly improves query debugging. We also introduced an approach to visualize retrieved data in a user-friendly way, based on templates describing visualizations of resource classes. To integrate all of our approaches, we developed a simple web application. Conclusions Our system was implemented successfully, and we demonstrated its usability on the ChEBI database transformed into RDF form. To demonstrate procedure call functions, we employed compound similarity searching based on OrChem. The application is publicly available at https://bioinfo.uochb.cas.cz/projects/chemRDF. Graphical Abstract ![]()
Collapse
Affiliation(s)
- Jakub Galgonek
- Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Flemingovo nám. 2, 166 10 Prague 6, Czech Republic
| | - Tomáš Hurt
- Faculty of Mathematics and Physics, Charles University in Prague, Malostranské nám. 25, 118 00 Prague 1, Czech Republic
| | - Vendula Michlíková
- Faculty of Mathematics and Physics, Charles University in Prague, Malostranské nám. 25, 118 00 Prague 1, Czech Republic
| | - Petr Onderka
- Faculty of Mathematics and Physics, Charles University in Prague, Malostranské nám. 25, 118 00 Prague 1, Czech Republic
| | - Jan Schwarz
- Faculty of Mathematics and Physics, Charles University in Prague, Malostranské nám. 25, 118 00 Prague 1, Czech Republic
| | - Jiří Vondrášek
- Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Flemingovo nám. 2, 166 10 Prague 6, Czech Republic
| |
Collapse
|
44
|
Shen F, Liu H, Sohn S, Larson DW, Lee Y. Predicate Oriented Pattern Analysis for Biomedical Knowledge Discovery. INTELLIGENT INFORMATION MANAGEMENT 2016; 8:66-85. [PMID: 28983419 PMCID: PMC5626454 DOI: 10.4236/iim.2016.83006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic.
Collapse
Affiliation(s)
- Feichen Shen
- CSEE Department, University of Missouri, Kansas City, MO, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - David W Larson
- Department of Surgery, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Yugyung Lee
- CSEE Department, University of Missouri, Kansas City, MO, USA
| |
Collapse
|
45
|
Predicting drug target interactions using meta-path-based semantic network analysis. BMC Bioinformatics 2016; 17:160. [PMID: 27071755 PMCID: PMC4830032 DOI: 10.1186/s12859-016-1005-x] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2015] [Accepted: 03/31/2016] [Indexed: 12/02/2022] Open
Abstract
Background In the context of drug discovery, drug target interactions (DTIs) can be predicted based on observed topological features of a semantic network across the chemical and biological space. In a semantic network, the types of the nodes and links are different. In order to take into account the heterogeneity of the semantic network, meta-path-based topological patterns were investigated for link prediction. Results Supervised machine learning models were constructed based on meta-path topological features of an enriched semantic network, which was derived from Chem2Bio2RDF, and was expanded by adding compound and protein similarity neighboring links obtained from the PubChem databases. The additional semantic links significantly improved the predictive performance of the supervised learning models. The binary classification model built upon the enriched feature space using the Random Forest algorithm significantly outperformed an existing semantic link prediction algorithm, Semantic Link Association Prediction (SLAP), to predict unknown links between compounds and protein targets in an evolving network. In addition to link prediction, Random Forest also has an intrinsic feature ranking algorithm, which can be used to select the important topological features that contribute to link prediction. Conclusions The proposed framework has been demonstrated as a powerful alternative to SLAP in order to predict DTIs using the semantic network that integrates chemical, pharmacological, genomic, biological, functional, and biomedical information into a unified framework. It offers the flexibility to enrich the feature space by using different normalization processes on the topological features, and it can perform model construction and feature selection at the same time. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1005-x) contains supplementary material, which is available to authorized users.
Collapse
|
46
|
Swainston N, Hastings J, Dekker A, Muthukrishnan V, May J, Steinbeck C, Mendes P. libChEBI: an API for accessing the ChEBI database. J Cheminform 2016; 8:11. [PMID: 26933452 PMCID: PMC4772646 DOI: 10.1186/s13321-016-0123-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 02/16/2016] [Indexed: 01/29/2023] Open
Abstract
Background ChEBI is a database and ontology of chemical entities of biological interest. It is widely used as a source of identifiers to facilitate unambiguous reference to chemical entities within biological models, databases, ontologies and literature. ChEBI contains a wealth of chemical data, covering over 46,500 distinct chemical entities, and related data such as chemical formula, charge, molecular mass, structure, synonyms and links to external databases. Furthermore, ChEBI is an ontology, and thus provides meaningful links between chemical entities. Unlike many other resources, ChEBI is fully human-curated, providing a reliable, non-redundant collection of chemical entities and related data. While ChEBI is supported by a web service for programmatic access and a number of download files, it does not have an API library to facilitate the use of ChEBI and its data in cheminformatics software. Results To provide
this missing functionality, libChEBI, a comprehensive API library for accessing ChEBI data, is introduced. libChEBI is available in Java, Python and MATLAB versions from http://github.com/libChEBI, and provides full programmatic access to all data held within the ChEBI database through a simple and documented API. libChEBI is reliant upon the (automated) download and regular update of flat files that are held locally. As such, libChEBI can be embedded in both on- and off-line software applications. Conclusions libChEBI allows better support of ChEBI and its data in the development of new cheminformatics software. Covering three key programming languages, it allows for the entirety of the ChEBI database to be accessed easily and quickly through a simple API. All code is open access and freely available. Electronic supplementary material The online version of this article (doi:10.1186/s13321-016-0123-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Neil Swainston
- Manchester Centre for Synthetic Biology of Fine and Specialty Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, M1 7DN UK ; European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Janna Hastings
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | - Adriano Dekker
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK
| | | | - John May
- European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD UK ; NextMove Software Ltd., Innovation Centre, Science Park, Milton Road, Cambridge, CB4 0EY UK
| | | | - Pedro Mendes
- Manchester Centre for Synthetic Biology of Fine and Specialty Chemicals (SYNBIOCHEM), Manchester Institute of Biotechnology, University of Manchester, Manchester, M1 7DN UK ; School of Computer Science, University of Manchester, Manchester, M13 9PL UK ; Center for Quantitative Medicine, UConn Health, Farmington, CT 06030 USA
| |
Collapse
|
47
|
Hu B, Gifford E, Wang H, Bailey W, Johnson T. Analysis of the ToxCast Chemical-Assay Space Using the Comparative Toxicogenomics Database. Chem Res Toxicol 2015; 28:2210-23. [PMID: 26505644 DOI: 10.1021/acs.chemrestox.5b00369] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Many studies have attempted to predict in vivo hazards based on the ToxCast in vitro assay results with the goal of using these predictions to prioritize compounds for conventional toxicity testing. Most of these conventional studies rely on in vivo end points observed using preclinical species (e.g., mice and rats). Although the preclinical animal studies provide valuable insights, there can often be significant disconnects between these studies and safety concerns in humans. One way to address these concerns, for an admittedly more limited set of compounds, is to explore relationships between the in vitro data from human cell lines and observations from human related studies. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org ) is a rich source of data linking chemicals to human diseases/adverse events and pathways. In this study we explored the relationships between ToxCast chemicals, their ToxCast in vitro test results, and their annotations of human disease/adverse event end points as captured in the CTD database. We mined these associations to identify potentially interesting, statistically significant in vitro assay and in vivo toxicity correlations. To the best of our knowledge, this is one of the first studies analyzing the relationships between the ToxCast in vitro assays results and the CTD disease/adverse event end point annotations. The in vitro profiles identified in this analysis may prove useful for prioritizing compounds for toxicity testing, suggesting mechanisms of toxicity, and forecasting potential in vivo human drug induced injury.
Collapse
Affiliation(s)
- Bingjie Hu
- Structural Chemistry, Merck Research Laboratories, Merck & Co. , West Point, Pennsylvania 19486, United States
| | - Eric Gifford
- Structural Chemistry, Merck Research Laboratories, Merck & Co. , West Point, Pennsylvania 19486, United States
| | - Huijun Wang
- Structural Chemistry, Merck Research Laboratories, Merck & Co. , Kenilworth, New Jersey 07033, United States
| | - Wendy Bailey
- Safety Assessment and Laboratory Animal Resources, Merck Research Laboratories, Merck & Co. , West Point, Pennsylvania 19486, United States
| | - Timothy Johnson
- Safety Assessment and Laboratory Animal Resources, Merck Research Laboratories, Merck & Co. , West Point, Pennsylvania 19486, United States
| |
Collapse
|
48
|
Optimizing drug-target interaction prediction based on random walk on heterogeneous networks. J Cheminform 2015; 7:40. [PMID: 26300984 PMCID: PMC4540752 DOI: 10.1186/s13321-015-0089-z] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2014] [Accepted: 07/13/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Predicting novel drug-target associations is important not only for developing new drugs, but also for furthering biological knowledge by understanding how drugs work and their modes of action. As more data about drugs, targets, and their interactions becomes available, computational approaches have become an indispensible part of drug target association discovery. In this paper we apply random walk with restart (RWR) method to a heterogeneous network of drugs and targets compiled from DrugBank database and investigate the performance of the method under parameter variation and choice of chemical fingerprint methods. RESULTS We show that choice of chemical fingerprint does not affect the performance of the method when the parameters are tuned to optimal values. We use a subset of the ChEMBL15 dataset that contains 2,763 associations between 544 drugs and 467 target proteins to evaluate our method, and we extracted datasets of bioactivity ≤1 and ≤10 μM activity cutoff. For 1 μM bioactivity cutoff, we find that our method can correctly predict nearly 47, 55, 60% of the given drug-target interactions in the test dataset having more than 0, 1, 2 drug target relations for ChEMBL 1 μM dataset in top 50 rank positions. For 10 μM bioactivity cutoff, we find that our method can correctly predict nearly 32.4, 34.8, 35.3% of the given drug-target interactions in the test dataset having more than 0, 1, 2 drug target relations for ChEMBL 1 μM dataset in top 50 rank positions. We further examine the associations between 110 popular top selling drugs in 2012 and 3,519 targets and find the top ten targets for each drug. CONCLUSIONS We demonstrate the effectiveness and promise of the approach-RWR on heterogeneous networks using chemical features-for identifying novel drug target interactions and investigate the performance.
Collapse
|
49
|
Abinaya E, Narang P, Bhardwaj A. FROG - Fingerprinting Genomic Variation Ontology. PLoS One 2015; 10:e0134693. [PMID: 26244889 PMCID: PMC4526677 DOI: 10.1371/journal.pone.0134693] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 07/13/2015] [Indexed: 11/19/2022] Open
Abstract
Genetic variations play a crucial role in differential phenotypic outcomes. Given the complexity in establishing this correlation and the enormous data available today, it is imperative to design machine-readable, efficient methods to store, label, search and analyze this data. A semantic approach, FROG: “FingeRprinting Ontology of Genomic variations” is implemented to label variation data, based on its location, function and interactions. FROG has six levels to describe the variation annotation, namely, chromosome, DNA, RNA, protein, variations and interactions. Each level is a conceptual aggregation of logically connected attributes each of which comprises of various properties for the variant. For example, in chromosome level, one of the attributes is location of variation and which has two properties, allosomes or autosomes. Another attribute is variation kind which has four properties, namely, indel, deletion, insertion, substitution. Likewise, there are 48 attributes and 278 properties to capture the variation annotation across six levels. Each property is then assigned a bit score which in turn leads to generation of a binary fingerprint based on the combination of these properties (mostly taken from existing variation ontologies). FROG is a novel and unique method designed for the purpose of labeling the entire variation data generated till date for efficient storage, search and analysis. A web-based platform is designed as a test case for users to navigate sample datasets and generate fingerprints. The platform is available at http://ab-openlab.csir.res.in/frog.
Collapse
Affiliation(s)
- E. Abinaya
- Department of Bioinformatics, SASTRA University, Thanjavur, Tamil Nadu, India
| | - Pankaj Narang
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Anshu Bhardwaj
- Open Source Drug Discovery Unit, Council of Scientific and Industrial Research (CSIR), Anusandhan Bhawan, 2 Rafi Marg, New Delhi, 110001, India
- * E-mail:
| |
Collapse
|
50
|
Fu G, Batchelor C, Dumontier M, Hastings J, Willighagen E, Bolton E. PubChemRDF: towards the semantic annotation of PubChem compound and substance databases. J Cheminform 2015; 7:34. [PMID: 26175801 PMCID: PMC4500850 DOI: 10.1186/s13321-015-0084-4] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 06/22/2015] [Indexed: 12/02/2022] Open
Abstract
Background PubChem is an open repository for chemical structures, biological activities and biomedical annotations. Semantic Web technologies are emerging as an increasingly important approach to distribute and integrate scientific data. Exposing PubChem data to Semantic Web services may help enable automated data integration and management, as well as facilitate interoperable web applications. Description This work, one of a series covering the PubChemRDF project, describes an approach to translate PubChem Substance and Compound information into Resource Description Framework (RDF) format. Basic examples are provided to demonstrate its use. The aim of this effort is to provide two new primary benefits to researchers in a cost-effective manner. Firstly, we aim to remove the inherent limitations of using the web-based resource PubChem by allowing a researcher to use readily available semantic technologies (namely, RDF triple stores and their corresponding SPARQL query engines) to query and analyze PubChem data on local computing resources. Secondly, this work intends to help improve data sharing, analysis, and integration of PubChem data to resources external to NCBI and across scientific domains, by means of the association of PubChem data to existing ontological frameworks, including CHEMical INFormation ontology, Semanticscience Integrated Ontology, and others. Conclusions With the goal of semantically describing information available in the PubChem archive, pre-existing ontological frameworks were used, rather than creating new ones. Semantic relationships between compounds and substances, chemical descriptors associated with compounds and substances, interrelationships between chemicals, as well as provenance and attribute metadata of substances are described. Electronic supplementary material The online version of this article (doi:10.1186/s13321-015-0084-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gang Fu
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, MD USA
| | - Colin Batchelor
- Royal Society of Chemistry, Thomas Graham House, Cambridge, UK
| | - Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, USA
| | - Janna Hastings
- European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institute of Health, Bethesda, MD USA
| |
Collapse
|