1
|
Daza D, Alivanistos D, Mitra P, Pijnenburg T, Cochez M, Groth P. BioBLP: a modular framework for learning on multimodal biomedical knowledge graphs. J Biomed Semantics 2023; 14:20. [PMID: 38066573 PMCID: PMC10709903 DOI: 10.1186/s13326-023-00301-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. OBJECTIVE We aim to understand how to incorporate multimodal data into biomedical KG embeddings, and analyze the resulting performance in comparison with traditional methods. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. RESULTS In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. Further analyses show that incorporating attribute data does outperform baselines over entities below a certain node degree, comprising approximately 75% of the diseases in the graph. We also observe that optimizing attribute encoders is a challenging task that increases optimization costs. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. CONCLUSION BioBLP allows to investigate different ways of incorporating multimodal biomedical data for learning representations in KGs. With a particular implementation, we find that incorporating attribute data does not consistently outperform baselines, but improvements are obtained on a comparatively large subset of entities below a specific node-degree. Our results indicate a potential for improved performance in scientific discovery tasks where understudied areas of the KG would benefit from link prediction methods.
Collapse
Affiliation(s)
- Daniel Daza
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
- University of Amsterdam, Amsterdam, The Netherlands.
- Discovery Lab, Elsevier, Amsterdam, The Netherlands.
| | - Dimitrios Alivanistos
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
- Discovery Lab, Elsevier, Amsterdam, The Netherlands.
| | | | | | - Michael Cochez
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| | - Paul Groth
- University of Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| |
Collapse
|
2
|
Automated QSPR modeling and data curation of physicochemical properties using KNIME platform: Prediction of partition coefficients. J INDIAN CHEM SOC 2022. [DOI: 10.1016/j.jics.2022.100672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
3
|
Drug Design by Pharmacophore and Virtual Screening Approach. Pharmaceuticals (Basel) 2022; 15:ph15050646. [PMID: 35631472 PMCID: PMC9145410 DOI: 10.3390/ph15050646] [Citation(s) in RCA: 78] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 05/18/2022] [Accepted: 05/21/2022] [Indexed: 12/20/2022] Open
Abstract
Computer-aided drug discovery techniques reduce the time and the costs needed to develop novel drugs. Their relevance becomes more and more evident with the needs due to health emergencies as well as to the diffusion of personalized medicine. Pharmacophore approaches represent one of the most interesting tools developed, by defining the molecular functional features needed for the binding of a molecule to a given receptor, and then directing the virtual screening of large collections of compounds for the selection of optimal candidates. Computational tools to create the pharmacophore model and to perform virtual screening are available and generated successful studies. This article describes the procedure of pharmacophore modelling followed by virtual screening, the most used software, possible limitations of the approach, and some applications reported in the literature.
Collapse
|
4
|
Karim MR, Michel A, Zappa A, Baranov P, Sahay R, Rebholz-Schuhmann D. Improving data workflow systems with cloud services and use of open data for bioinformatics research. Brief Bioinform 2019; 19:1035-1050. [PMID: 28419324 PMCID: PMC6169675 DOI: 10.1093/bib/bbx039] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Indexed: 11/22/2022] Open
Abstract
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community.
Collapse
Affiliation(s)
- Md Rezaul Karim
- Semantics in eHealth and Life Sciences (SeLS), Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
| | - Audrey Michel
- School of Biochemistry and Cell Biology, University College Cork, Ireland
| | - Achille Zappa
- Insight Centre for Data Analytics, National University of Ireland Galway, Dangan, Galway, Ireland
| | - Pavel Baranov
- School of Biochemistry and Cell Biology, University College Cork, Ireland
| | - Ratnesh Sahay
- Semantics in eHealth and Life Sciences (SeLS), Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
| | | |
Collapse
|
5
|
Kanza S, Frey JG. A new wave of innovation in Semantic web tools for drug discovery. Expert Opin Drug Discov 2019; 14:433-444. [DOI: 10.1080/17460441.2019.1586880] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Samantha Kanza
- Department of Chemistry, Highfield Campus, University of Southampton, Southampton, UK
| | - Jeremy Graham Frey
- Department of Chemistry, Highfield Campus, University of Southampton, Southampton, UK
| |
Collapse
|
6
|
Miller RA, Woollard P, Willighagen EL, Digles D, Kutmon M, Loizou A, Waagmeester A, Senger S, Evelo CT. Explicit interaction information from WikiPathways in RDF facilitates drug discovery in the Open PHACTS Discovery Platform. F1000Res 2018; 7:75. [PMID: 30416713 PMCID: PMC6206606 DOI: 10.12688/f1000research.13197.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/24/2018] [Indexed: 12/11/2022] Open
Abstract
Open PHACTS is a pre-competitive project to answer scientific questions developed recently by the pharmaceutical industry. Having high quality biological interaction information in the Open PHACTS Discovery Platform is needed to answer multiple pathway related questions. To address this, updated WikiPathways data has been added to the platform. This data includes information about biological interactions, such as stimulation and inhibition. The platform's Application Programming Interface (API) was extended with appropriate calls to reference these interactions. These new methods of the Open PHACTS API are available now.
Collapse
Affiliation(s)
- Ryan A Miller
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands
| | | | - Egon L Willighagen
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands
| | - Daniela Digles
- Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria
| | - Martina Kutmon
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands.,Maastricht Center for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | | | - Andra Waagmeester
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands.,Micelio, Antwerp, Belgium
| | | | - Chris T Evelo
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands.,Maastricht Center for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands.,Open PHACTS Foundation, Science Park, Cambridge, UK
| |
Collapse
|
7
|
Abstract
The Open PHACTS Discovery Platform integrates several public databases, which can be of interest when annotating the results of a phenotypic screening campaign. Workflow tools provide easy-to-customize possibilities to access the platform. Here, we describe how to create such workflows for two different workflow tools (KNIME and Pipeline Pilot), including a protocol to annotate compounds (e.g., phenotypic screening hits) with compound classification, known protein targets, and classifications of the targets.
Collapse
Affiliation(s)
- Daniela Digles
- Department of Pharmaceutical Chemistry, University of Vienna, Vienna, Austria.
| | | | - Edgar Jacoby
- Janssen Research and Development, Beerse, Belgium
| |
Collapse
|
8
|
Senger S. Assessment of the significance of patent-derived information for the early identification of compound-target interaction hypotheses. J Cheminform 2017; 9:26. [PMID: 29086108 PMCID: PMC5400772 DOI: 10.1186/s13321-017-0214-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 04/13/2017] [Indexed: 11/16/2022] Open
Abstract
Background Patents are an important source of information for effective decision making in drug discovery. Encouragingly, freely accessible patent-chemistry databases are now in the public domain. However, at present there is still a wide gap between relatively low coverage-high quality manually-curated data sources and high coverage data sources that use text mining and automated extraction of chemical structures. To secure much needed funding for further research and an improved infrastructure, hard evidence is required to demonstrate the significance of patent-derived information in drug discovery. Surprisingly little such evidence has been reported so far. To address this, the present study attempts to quantify the relevance of patents for formulating and substantiating hypotheses for compound–target interactions. Results A manually-curated set of 130 compound–target interaction pairs annotated with what are considered to be the earliest patent and publication has been produced. The analysis of this set revealed that in stark contrast to what has been reported for novel chemical structures, only about 10% of the compound–target interaction pairs could be found in publications in the scientific literature within one year of being reported in patents. The average delay across all interaction pairs is close to 4 years. In an attempt to benchmark current capabilities, it was also examined how much of the benefit of using patent-derived information can be retained when a bioannotated version of SureChEMBL is used as secondary source for the patent literature. Encouragingly, this approach found the patents in the annotated set for 72% of the compound–target interaction pairs. Similarly, the effect of using the bioactivity database ChEMBL as secondary source for the scientific literature was studied. Here, the publications from the annotated set were only found for 46% of the compound–target interaction pairs. Conclusion Patent-derived information is a significant enabler for formulating compound–target interaction hypotheses even in cases where the respective interaction is later reported in the scientific literature. The findings of this study clearly highlight the significance of future investments in the development and provision of databases and tools that will allow scientists to search patent information in a comprehensive, reliable, and efficient manner. Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0214-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Stefan Senger
- GlaxoSmithKline, Stevenage, Hertfordshire, SG1 2NY, UK.
| |
Collapse
|
9
|
Goldmann D, Zdrazil B, Digles D, Ecker GF. Empowering pharmacoinformatics by linked life science data. J Comput Aided Mol Des 2017; 31:319-328. [PMID: 27830428 PMCID: PMC5385323 DOI: 10.1007/s10822-016-9990-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 10/24/2016] [Indexed: 11/11/2022]
Abstract
With the public availability of large data sources such as ChEMBLdb and the Open PHACTS Discovery Platform, retrieval of data sets for certain protein targets of interest with consistent assay conditions is no longer a time consuming process. Especially the use of workflow engines such as KNIME or Pipeline Pilot allows complex queries and enables to simultaneously search for several targets. Data can then directly be used as input to various ligand- and structure-based studies. In this contribution, using in-house projects on P-gp inhibition, transporter selectivity, and TRPV1 modulation we outline how the incorporation of linked life science data in the daily execution of projects allowed to expand our approaches from conventional Hansch analysis to complex, integrated multilayer models.
Collapse
Affiliation(s)
- Daria Goldmann
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, 1090, Vienna, Austria
| | - Barbara Zdrazil
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, 1090, Vienna, Austria
| | - Daniela Digles
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, 1090, Vienna, Austria
| | - Gerhard F Ecker
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, 1090, Vienna, Austria.
| |
Collapse
|
10
|
Siragusa L, Luciani R, Borsari C, Ferrari S, Costi MP, Cruciani G, Spyrakis F. Comparing Drug Images and Repurposing Drugs with BioGPS and FLAPdock: The Thymidylate Synthase Case. ChemMedChem 2016; 11:1653-66. [PMID: 27404817 DOI: 10.1002/cmdc.201600121] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 06/08/2016] [Indexed: 12/14/2022]
Abstract
Repurposing and repositioning drugs has become a frequently pursued and successful strategy in the current era, as new chemical entities are increasingly difficult to find and get approved. Herein we report an integrated BioGPS/FLAPdock pipeline for rapid and effective off-target identification and drug repurposing. Our method is based on the structural and chemical properties of protein binding sites, that is, the ligand image, encoded in the GRID molecular interaction fields (MIFs). Protein similarity is disclosed through the BioGPS algorithm by measuring the pockets' overlap according to which pockets are clustered. Co-crystallized and known ligands can be cross-docked among similar targets, selected for subsequent in vitro binding experiments, and possibly improved for inhibitory potency. We used human thymidylate synthase (TS) as a test case and searched the entire RCSB Protein Data Bank (PDB) for similar target pockets. We chose casein kinase IIα as a control and tested a series of its inhibitors against the TS template. Ellagic acid and apigenin were identified as TS inhibitors, and various flavonoids were selected and synthesized in a second-round selection. The compounds were demonstrated to be active in the low-micromolar range.
Collapse
Affiliation(s)
- Lydia Siragusa
- Molecular Discovery Limited, 215 Marsh Road, Pinner Middlesex, London, HA5 5NE, UK
| | - Rosaria Luciani
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Campi 103, 41125, Modena, Italy
| | - Chiara Borsari
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Campi 103, 41125, Modena, Italy
| | - Stefania Ferrari
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Campi 103, 41125, Modena, Italy
| | - Maria Paola Costi
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Campi 103, 41125, Modena, Italy
| | - Gabriele Cruciani
- Department of Chemistry, Biology and Biotechnology, University of Perugia, Via Elce di Sotto 8, 06123, Perugia, Italy
| | - Francesca Spyrakis
- Department of Life Sciences, University of Modena and Reggio Emilia, Via Campi 103, 41125, Modena, Italy. .,Department of Food Science, University of Parma, Viale delle Scienze 17A, 43124, Parma, Italy.
| |
Collapse
|
11
|
Thomas S, Wolstencroft K, de Bono B, Hunter PJ. A physiome interoperability roadmap for personalized drug development. Interface Focus 2016; 6:20150094. [PMID: 27051513 DOI: 10.1098/rsfs.2015.0094] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The goal of developing therapies and dosage regimes for characterized subgroups of the general population can be facilitated by the use of simulation models able to incorporate information about inter-individual variability in drug disposition (pharmacokinetics), toxicity and response effect (pharmacodynamics). Such observed variability can have multiple causes at various scales, ranging from gross anatomical differences to differences in genome sequence. Relevant data for many of these aspects, particularly related to molecular assays (known as '-omics'), are available in online resources, but identification and assignment to appropriate model variables and parameters is a significant bottleneck in the model development process. Through its efforts to standardize annotation with consequent increase in data usability, the human physiome project has a vital role in improving productivity in model development and, thus, the development of personalized therapy regimes. Here, we review the current status of personalized medicine in clinical practice, outline some of the challenges that must be overcome in order to expand its applicability, and discuss the relevance of personalized medicine to the more widespread challenges being faced in drug discovery and development. We then review some of (i) the key data resources available for use in model development and (ii) the potential areas where advances made within the physiome modelling community could contribute to physiologically based pharmacokinetic and physiologically based pharmacokinetic/pharmacodynamic modelling in support of personalized drug development. We conclude by proposing a roadmap to further guide the physiome community in its on-going efforts to improve data usability, and integration with modelling efforts in the support of personalized medicine development.
Collapse
Affiliation(s)
- Simon Thomas
- Cyprotex Discovery Ltd , 15 Beech Lane, Macclesfield SK10 2DR , UK
| | - Katherine Wolstencroft
- Leiden Institute of Advanced Computer Science , Leiden University , 111 Snellius, Niels Bohrweg 1, 2333 CA Leiden , The Netherlands
| | - Bernard de Bono
- Farr Institute, University College London, London NW1 2DA, UK; Auckland Bioengineering Institute, The University of Auckland, Auckland 1010, New Zealand
| | - Peter J Hunter
- Auckland Bioengineering Institute , The University of Auckland , Auckland 1010 , New Zealand
| |
Collapse
|
12
|
Nongonierma AB, FitzGerald RJ. Strategies for the discovery, identification and validation of milk protein-derived bioactive peptides. Trends Food Sci Technol 2016. [DOI: 10.1016/j.tifs.2016.01.022] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
13
|
Mellor CL, Steinmetz FP, Cronin MTD. Using Molecular Initiating Events to Develop a Structural Alert Based Screening Workflow for Nuclear Receptor Ligands Associated with Hepatic Steatosis. Chem Res Toxicol 2016; 29:203-12. [DOI: 10.1021/acs.chemrestox.5b00480] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Claire L. Mellor
- School of Pharmacy and Biomolecular
Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| | - Fabian P. Steinmetz
- School of Pharmacy and Biomolecular
Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| | - Mark T. D. Cronin
- School of Pharmacy and Biomolecular
Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| |
Collapse
|
14
|
Bolton E. Reporting biological assay screening results for maximum impact. DRUG DISCOVERY TODAY. TECHNOLOGIES 2015; 14:31-6. [PMID: 26194585 DOI: 10.1016/j.ddtec.2015.03.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Revised: 03/18/2015] [Accepted: 03/29/2015] [Indexed: 11/19/2022]
Abstract
A very large corpus of biological assay screening results exist in the public domain. The ability to compare and analyze this data is hampered due to missing details and lack of a commonly used terminology to describe assay protocols and assay endpoints. Minimum reporting guidelines exist that, if followed, would greatly enhance the utility of biological assay screening data so it may be independently reproduced, readily integrated, effectively compared, and rapidly analyzed.
Collapse
Affiliation(s)
- Evan Bolton
- National Center for Biotechnology Information, Bldg. 38A/8S810, National Library of Medicine, U.S. National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| |
Collapse
|