1
|
Giordano M, Maddalena L, Manzo M, Guarracino MR. Adversarial attacks on graph-level embedding methods: a case study. ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE 2023; 91:259-285. [DOI: 10.1007/s10472-022-09811-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 07/27/2022] [Indexed: 09/02/2023]
Abstract
AbstractAs the number of graph-level embedding techniques increases at an unprecedented speed, questions arise about their behavior and performance when training data undergo perturbations. This is the case when an external entity maliciously alters training data to invalidate the embedding. This paper explores the effects of such attacks on some graph datasets by applying different graph-level embedding techniques. The main attack strategy involves manipulating training data to produce an altered model. In this context, our goal is to go in-depth about methods, resources, experimental settings, and performance results to observe and study all the aspects that derive from the attack stage.
Collapse
|
2
|
A weighted-link graph neural network for lung cancer knowledge classification. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04437-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
3
|
Qi G, Xu Z, Dan H, Jia X, Jiang Q, Zhang A, Li Z, Liu X, Ma J, Zheng X, Li Z. A Complex Heterogeneous Network Model of Disease Regulated by Noncoding RNAs: A Case Study of Unstable Angina Pectoris. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5852089. [PMID: 36590836 PMCID: PMC9803582 DOI: 10.1155/2022/5852089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 11/27/2022] [Accepted: 12/02/2022] [Indexed: 12/24/2022]
Abstract
MicroRNAs (miRNAs) are important types of noncoding RNAs, and there is a lack of holistic and systematic understanding of the functions they play in disease. We proposed a research strategy, including two parts network analysis and network modelling, to analyze, model, and predict the regulatory network of miRNAs from a network perspective, using unstable angina pectoris as an example. In the network analysis section, we proposed the WGCNA & SimCluster method using both correlation and similarity to find hub miRNAs, and validation on two datasets showed better results than the methods using correlation or similarity alone. In the network modelling section, we used six knowledge graph or graph neural network models for link prediction of three types of edges and multilabel classification of two types of nodes. Comparative experiments showed that the RotatE model was a good model for link prediction, while the RGCN model was the best model for multilabel classification. Potential target genes were predicted for hub miRNAs and validation of hub miRNA-target gene interactions, target genes as biomarkers and target gene functions were performed using a three-step validation approach. In conclusion, our study provides a new strategy to analyze and model miRNA regulatory networks.
Collapse
Affiliation(s)
- Guanpeng Qi
- School of Pharmacy, Shenyang Pharmaceutical University, Shenyang 110016, China
| | - Ze Xu
- School of Pharmacy, Shenyang Pharmaceutical University, Shenyang 110016, China
| | - Hanyu Dan
- School of Medical Devices, Shenyang Pharmaceutical University, Shenyang 110016, China
| | - Xiangnan Jia
- School of Medical Devices, Shenyang Pharmaceutical University, Shenyang 110016, China
| | - Qiang Jiang
- School of Medical Devices, Shenyang Pharmaceutical University, Shenyang 110016, China
| | - Aijun Zhang
- School of Pharmacy, Shenyang Pharmaceutical University, Shenyang 110016, China
| | - Zhaohang Li
- School of Pharmacy, Shenyang Pharmaceutical University, Shenyang 110016, China
| | - Xin Liu
- School of Life Sciences and Biopharmaceuticals, Shenyang Pharmaceutical University, Shenyang 110016, China
| | - Juman Ma
- School of Life Sciences and Biopharmaceuticals, Shenyang Pharmaceutical University, Shenyang 110016, China
| | - Xiaosong Zheng
- School of Medical Devices, Shenyang Pharmaceutical University, Shenyang 110016, China
| | - Zuojing Li
- School of Medical Devices, Shenyang Pharmaceutical University, Shenyang 110016, China
| |
Collapse
|
4
|
Fernández-Torras A, Duran-Frigola M, Bertoni M, Locatelli M, Aloy P. Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque. Nat Commun 2022; 13:5304. [PMID: 36085310 PMCID: PMC9463154 DOI: 10.1038/s41467-022-33026-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 08/30/2022] [Indexed: 12/25/2022] Open
Abstract
Biomedical data is accumulating at a fast pace and integrating it into a unified framework is a major challenge, so that multiple views of a given biological event can be considered simultaneously. Here we present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated biomedical descriptors derived from a gigantic knowledge graph, displaying more than 450 thousand biological entities and 30 million relationships between them. The Bioteque integrates, harmonizes, and formats data collected from over 150 data sources, including 12 biological entities (e.g., genes, diseases, drugs) linked by 67 types of associations (e.g., 'drug treats disease', 'gene interacts with gene'). We show how Bioteque descriptors facilitate the assessment of high-throughput protein-protein interactome data, the prediction of drug response and new repurposing opportunities, and demonstrate that they can be used off-the-shelf in downstream machine learning tasks without loss of performance with respect to using original data. The Bioteque thus offers a thoroughly processed, tractable, and highly optimized assembly of the biomedical knowledge available in the public domain.
Collapse
Affiliation(s)
- Adrià Fernández-Torras
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Miquel Duran-Frigola
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Ersilia Open Source Initiative, Cambridge, UK
| | - Martino Bertoni
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Martina Locatelli
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain.
| |
Collapse
|
5
|
Identifying genes targeted by disease-associated non-coding SNPs with a protein knowledge graph. PLoS One 2022; 17:e0271395. [PMID: 35830458 PMCID: PMC9278741 DOI: 10.1371/journal.pone.0271395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 06/29/2022] [Indexed: 12/24/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) that play important roles in the genetic heritability of traits and diseases. With most of these SNPs located on the non-coding part of the genome, it is currently assumed that these SNPs influence the expression of nearby genes on the genome. However, identifying which genes are targeted by these disease-associated SNPs remains challenging. In the past, protein knowledge graphs have often been used to identify genes that are associated with disease, also referred to as “disease genes”. Here, we explore whether protein knowledge graphs can be used to identify genes that are targeted by disease-associated non-coding SNPs by testing and comparing the performance of six existing methods for a protein knowledge graph, four of which were developed for disease gene identification. We compare our performance against two baselines: (1) an existing state-of-the-art method that is based on guilt-by-association, and (2) the leading assumption that SNPs target the nearest gene on the genome. We test these methods with four reference sets, three of which were obtained by different means. Furthermore, we combine methods to investigate whether their combination improves performance. We find that protein knowledge graphs that include predicate information perform comparable to the current state of the art, achieving an area under the receiver operating characteristic curve (AUC) of 79.6% on average across all four reference sets. Protein knowledge graphs that lack predicate information perform comparable to our other baseline (genetic distance) which achieved an AUC of 75.7% across all four reference sets. Combining multiple methods improved performance to 84.9% AUC. We conclude that methods for a protein knowledge graph can be used to identify which genes are targeted by disease-associated non-coding SNPs.
Collapse
|
6
|
Chen X, Xie H, Li Z, Cheng G. Topic analysis and development in knowledge graph research: A bibliometric review on three decades. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.098] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
7
|
Manzo M, Giordano M, Maddalena L, Guarracino MR. Performance Evaluation of Adversarial Attacks on Whole-Graph Embedding Models. LECTURE NOTES IN COMPUTER SCIENCE 2021:219-236. [DOI: 10.1007/978-3-030-92121-7_19] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
8
|
Vlietstra WJ, Vos R, van den Akker M, van Mulligen EM, Kors JA. Identifying disease trajectories with predicate information from a knowledge graph. J Biomed Semantics 2020; 11:9. [PMID: 32819419 PMCID: PMC7439632 DOI: 10.1186/s13326-020-00228-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 08/12/2020] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Knowledge graphs can represent the contents of biomedical literature and databases as subject-predicate-object triples, thereby enabling comprehensive analyses that identify e.g. relationships between diseases. Some diseases are often diagnosed in patients in specific temporal sequences, which are referred to as disease trajectories. Here, we determine whether a sequence of two diseases forms a trajectory by leveraging the predicate information from paths between (disease) proteins in a knowledge graph. Furthermore, we determine the added value of directional information of predicates for this task. To do so, we create four feature sets, based on two methods for representing indirect paths, and both with and without directional information of predicates (i.e., which protein is considered subject and which object). The added value of the directional information of predicates is quantified by comparing the classification performance of the feature sets that include or exclude it. RESULTS Our method achieved a maximum area under the ROC curve of 89.8% and 74.5% when evaluated with two different reference sets. Use of directional information of predicates significantly improved performance by 6.5 and 2.0 percentage points respectively. CONCLUSIONS Our work demonstrates that predicates between proteins can be used to identify disease trajectories. Using the directional information of predicates significantly improved performance over not using this information.
Collapse
Affiliation(s)
- Wytze J. Vlietstra
- Department of Medical Informatics, Erasmus University Medical Center, Dr. Molewaterplein 50, 3015 GE Rotterdam, the Netherlands
| | - Rein Vos
- Department of Medical Informatics, Erasmus University Medical Center, Dr. Molewaterplein 50, 3015 GE Rotterdam, the Netherlands
- Department of Methodology & Statistics, Maastricht University, PO Box 616, 6200 MD Maastricht, the Netherlands
| | - Marjan van den Akker
- Institute of General Practice, Johann Wolfgang Goethe University, Theodor-Stern-Kai 7, D-60590 Frankfurt, Germany
- Department of Family Medicine, Maastricht University, PO Box 616, 6200 MD Maastricht, the Netherlands
| | - Erik M. van Mulligen
- Department of Medical Informatics, Erasmus University Medical Center, Dr. Molewaterplein 50, 3015 GE Rotterdam, the Netherlands
| | - Jan A. Kors
- Department of Medical Informatics, Erasmus University Medical Center, Dr. Molewaterplein 50, 3015 GE Rotterdam, the Netherlands
| |
Collapse
|
9
|
Abstract
Knowledge-based biomedical data science involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey recent progress in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as progress on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing to construct knowledge graphs, and the expansion of novel knowledge-based approaches to clinical and biological domains.
Collapse
Affiliation(s)
- Tiffany J Callahan
- Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA
| | - Ignacio J Tripodi
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| | - Harrison Pielke-Lombardo
- Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA
| | - Lawrence E Hunter
- Computational Bioscience Program and Department of Pharmacology, University of Colorado Denver Anschutz Medical Campus, Aurora, Colorado 80045, USA
| |
Collapse
|
10
|
Du J, Li X. A Knowledge Graph of Combined Drug Therapies Using Semantic Predications From Biomedical Literature: Algorithm Development. JMIR Med Inform 2020; 8:e18323. [PMID: 32343247 PMCID: PMC7218597 DOI: 10.2196/18323] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 03/26/2020] [Accepted: 03/29/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Combination therapy plays an important role in the effective treatment of malignant neoplasms and precision medicine. Numerous clinical studies have been carried out to investigate combination drug therapies. Automated knowledge discovery of these combinations and their graphic representation in knowledge graphs will enable pattern recognition and identification of drug combinations used to treat a specific type of cancer, improve drug efficacy and treatment of human disorders. OBJECTIVE This paper aims to develop an automated, visual approach to discover knowledge about combination therapies from biomedical literature, especially from those studies with high-level evidence such as clinical trial reports and clinical practice guidelines. METHODS Based on semantic predications, which consist of a triple structure of subject-predicate-object (SPO), we proposed an automated algorithm to discover knowledge of combination drug therapies using the following rules: 1) two or more semantic predications (S1-P-O and Si-P-O, i = 2, 3…) can be extracted from one conclusive claim (sentence) in the abstract of a given publication, and 2) these predications have an identical predicate (that closely relates to human disease treatment, eg, "treat") and object (eg, disease name) but different subjects (eg, drug names). A customized knowledge graph organizes and visualizes these combinations, improving the traditional semantic triples. After automatic filtering of broad concepts such as "pharmacologic actions" and generic disease names, a set of combination drug therapies were identified and characterized through manual interpretation. RESULTS We retrieved 22,263 clinical trial reports and 31 clinical practice guidelines from PubMed abstracts by searching "antineoplastic agents" for drug restriction (published between Jan 2009 and Oct 2019). There were 15,603 conclusive claims locally parsed using the search terms "conclusion*" and "conclude*" ready for semantic predications extraction by SemRep, and 325 candidate groups of semantic predications about combined medications were automatically discovered within 316 conclusive claims. Based on manual analysis, we determined that 255/316 claims (78.46%) were accurately identified as describing combination therapies and adopted these to construct the customized knowledge graph. We also identified two categories (and 4 subcategories) to characterize the inaccurate results: limitations of SemRep and limitations of proposal. We further learned the predominant patterns of drug combinations based on mechanism of action for new combined medication studies and discovered 4 obvious markers ("combin*," "coadministration," "co-administered," and "regimen") to identify potential combination therapies to enable development of a machine learning algorithm. CONCLUSIONS Semantic predications from conclusive claims in the biomedical literature can be used to support automated knowledge discovery and knowledge graph construction for combination therapies. A machine learning approach is warranted to take full advantage of the identified markers and other contextual features.
Collapse
Affiliation(s)
- Jian Du
- National Institute of Health Data Science, Peking University, Beijing, China
| | - Xiaoying Li
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
11
|
van Vlijmen H, Mons A, Waalkens A, Franke W, Baak A, Ruiter G, Kirkpatrick C, da Silva Santos LOB, Meerman B, Jellema R, Arts D, Kersloot M, Knijnenburg S, Lusher S, Verbeeck R, Neefs JM. The Need of Industry to Go FAIR. DATA INTELLIGENCE 2020. [DOI: 10.1162/dint_a_00050] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The industry sector is a very large producer and consumer of data, and many companies traditionally focused on production or manufacturing are now relying on the analysis of large amounts of data to develop new products and services. As many of the data sources needed are distributed and outside the company, FAIR data will have a major impact, both by reducing the existing internal data silos and by enabling the efficient integration with external (public and commercial) data. Many companies are still in the early phases of internal data “FAIRification”, providing opportunities for SMEs and academics to apply and develop their expertise on FAIR data in collaborations and public-private partnerships. For a global Internet of FAIR Data & Services to thrive, also involving industry, professional tools and services are essential. FAIR metrics and certifications on individuals, data, organizations, and software, must ensure that data producers and consumers have independent quality metrics on their data. In this opinion article we reflect on some industry specific challenges of FAIR implementation to be dealt with when choices are made regarding “Industry GOing FAIR”.
Collapse
Affiliation(s)
| | | | - Arne Waalkens
- Accenture, Gustav Mahlerplein 90, 1082 MA Amsterdam, The Netherlands
| | - Wouter Franke
- Zorg Instituut Nederland, Willem Dudokhof 1, 1112 ZA Diemen, The Netherlands
| | - Arie Baak
- Euretos, Yalelaan 1, 3584 CL Utrecht, The Netherlands
| | - Gerbrand Ruiter
- Mobiquity, Tommaso Albinonistraat 9, 1083 HM Amsterdam, The Netherlands
| | | | | | - Bert Meerman
- GO FAIR Foundation, Rijnsburgerweg 10, 2333 AA Leiden, The Netherlands
| | - Renger Jellema
- DSM Biotechnology Center, Alexander Fleminglaan 1, 2613 AX Delft, The Netherlands
| | - Derk Arts
- Castor, Paasheuvelweg 25, Vleugel 5D, 1105 BP Amsterdam, The Netherlands
| | - Martijn Kersloot
- Castor, Paasheuvelweg 25, Vleugel 5D, 1105 BP Amsterdam, The Netherlands
| | | | - Scott Lusher
- Janssen Pharmaceuticals, Antwerpseweg 15, 2340 Beerse, Belgium
| | - Rudi Verbeeck
- Janssen Pharmaceuticals, Antwerpseweg 15, 2340 Beerse, Belgium
| | - Jean-Marc Neefs
- Janssen Pharmaceuticals, Antwerpseweg 15, 2340 Beerse, Belgium
| |
Collapse
|