1
|
Stear BJ, Mohseni Ahooyi T, Simmons JA, Kollar C, Hartman L, Beigel K, Lahiri A, Vasisht S, Callahan TJ, Nemarich CM, Silverstein JC, Taylor DM. Petagraph: A large-scale unifying knowledge graph framework for integrating biomolecular and biomedical data. Sci Data 2024; 11:1338. [PMID: 39695169 PMCID: PMC11655564 DOI: 10.1038/s41597-024-04070-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 11/04/2024] [Indexed: 12/20/2024] Open
Abstract
Over the past decade, there has been substantial growth in both the quantity and complexity of available biomedical data. In order to more efficiently harness this extensive data and alleviate challenges associated with integration of multi-omics data, we developed Petagraph, a biomedical knowledge graph that encompasses over 32 million nodes and 118 million relationships. Petagraph leverages more than 180 ontologies and standards in the Unified Biomedical Knowledge Graph (UBKG) to embed millions of quantitative genomics data points. Petagraph provides a cohesive data environment that enables users to efficiently analyze, annotate, and discern relationships within and across complex multi-omics datasets supported by UBKG's annotation scaffold. We demonstrate how queries on Petagraph can generate meaningful results across various research contexts and use cases.
Collapse
Affiliation(s)
- Benjamin J Stear
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Taha Mohseni Ahooyi
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - J Alan Simmons
- Department of Biomedical Informatics, School of Medicine, The University of Pittsburgh, Pittsburgh, PA, USA
| | - Charles Kollar
- Department of Biomedical Informatics, School of Medicine, The University of Pittsburgh, Pittsburgh, PA, USA
| | - Lance Hartman
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Katherine Beigel
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Aditya Lahiri
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shubha Vasisht
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Tiffany J Callahan
- Department of Biomedical Informatics, Columbia University Irving Medical Campus, New York, NY, USA
| | - Christopher M Nemarich
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Jonathan C Silverstein
- Department of Biomedical Informatics, School of Medicine, The University of Pittsburgh, Pittsburgh, PA, USA
| | - Deanne M Taylor
- Department of Biomedical and Health Informatics (DBHI), The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Department of Pediatrics, University of Pennsylvania Perelman Medical School, Philadelphia, PA, USA.
| |
Collapse
|
2
|
Gualdi F, Oliva B, Piñero J. Predicting gene disease associations with knowledge graph embeddings for diseases with curtailed information. NAR Genom Bioinform 2024; 6:lqae049. [PMID: 38745993 PMCID: PMC11091931 DOI: 10.1093/nargab/lqae049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 03/08/2024] [Accepted: 04/24/2024] [Indexed: 05/16/2024] Open
Abstract
Knowledge graph embeddings (KGE) are a powerful technique used in the biomedical domain to represent biological knowledge in a low dimensional space. However, a deep understanding of these methods is still missing, and, in particular, regarding their applications to prioritize genes associated with complex diseases with reduced genetic information. In this contribution, we built a knowledge graph (KG) by integrating heterogeneous biomedical data and generated KGE by implementing state-of-the-art methods, and two novel algorithms: Dlemb and BioKG2vec. Extensive testing of the embeddings with unsupervised clustering and supervised methods showed that KGE can be successfully implemented to predict genes associated with diseases and that our novel approaches outperform most existing algorithms in both scenarios. Our findings underscore the significance of data quality, preprocessing, and integration in achieving accurate predictions. Additionally, we applied KGE to predict genes linked to Intervertebral Disc Degeneration (IDD) and illustrated that functions pertinent to the disease are enriched within the prioritized gene set.
Collapse
Affiliation(s)
- Francesco Gualdi
- Integrative Biomedical Informatics, Research Programme on Biomedical Informatics (IBI-GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
- Structural Bioinformatics Lab, Research Programme on Biomedical Informatics (SBI-GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Baldomero Oliva
- Structural Bioinformatics Lab, Research Programme on Biomedical Informatics (SBI-GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
| | - Janet Piñero
- Integrative Biomedical Informatics, Research Programme on Biomedical Informatics (IBI-GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain
- Medbioinformatics Solutions SL, Barcelona, Spain
| |
Collapse
|
3
|
Sosa DN, Neculae G, Fauqueur J, Altman RB. Elucidating the semantics-topology trade-off for knowledge inference-based pharmacological discovery. J Biomed Semantics 2024; 15:5. [PMID: 38693563 PMCID: PMC11064343 DOI: 10.1186/s13326-024-00308-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 04/21/2024] [Indexed: 05/03/2024] Open
Abstract
Leveraging AI for synthesizing the deluge of biomedical knowledge has great potential for pharmacological discovery with applications including developing new therapeutics for untreated diseases and repurposing drugs as emergent pandemic treatments. Creating knowledge graph representations of interacting drugs, diseases, genes, and proteins enables discovery via embedding-based ML approaches and link prediction. Previously, it has been shown that these predictive methods are susceptible to biases from network structure, namely that they are driven not by discovering nuanced biological understanding of mechanisms, but based on high-degree hub nodes. In this work, we study the confounding effect of network topology on biological relation semantics by creating an experimental pipeline of knowledge graph semantic and topological perturbations. We show that the drop in drug repurposing performance from ablating meaningful semantics increases by 21% and 38% when mitigating topological bias in two networks. We demonstrate that new methods for representing knowledge and inferring new knowledge must be developed for making use of biomedical semantics for pharmacological innovation, and we suggest fruitful avenues for their development.
Collapse
Affiliation(s)
- Daniel N Sosa
- Stanford University, Department of Biomedical Data Science, Stanford, CA, USA
| | | | | | - Russ B Altman
- Stanford University, Department of Bioengineering, Stanford, CA, USA.
- Stanford University, Department of Genetics, Stanford, CA, USA.
| |
Collapse
|
4
|
Wang X, Yang K, Jia T, Gu F, Wang C, Xu K, Shu Z, Xia J, Zhu Q, Zhou X. KDGene: knowledge graph completion for disease gene prediction using interactional tensor decomposition. Brief Bioinform 2024; 25:bbae161. [PMID: 38605639 PMCID: PMC11009469 DOI: 10.1093/bib/bbae161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 02/20/2024] [Accepted: 03/13/2024] [Indexed: 04/13/2024] Open
Abstract
The accurate identification of disease-associated genes is crucial for understanding the molecular mechanisms underlying various diseases. Most current methods focus on constructing biological networks and utilizing machine learning, particularly deep learning, to identify disease genes. However, these methods overlook complex relations among entities in biological knowledge graphs. Such information has been successfully applied in other areas of life science research, demonstrating their effectiveness. Knowledge graph embedding methods can learn the semantic information of different relations within the knowledge graphs. Nonetheless, the performance of existing representation learning techniques, when applied to domain-specific biological data, remains suboptimal. To solve these problems, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end knowledge graph completion framework for disease gene prediction using interactional tensor decomposition named KDGene. KDGene incorporates an interaction module that bridges entity and relation embeddings within tensor decomposition, aiming to improve the representation of semantically similar concepts in specific domains and enhance the ability to accurately predict disease genes. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms, whether existing disease gene prediction methods or knowledge graph embedding methods for general domains. Moreover, the comprehensive biological analysis of the predicted results further validates KDGene's capability to accurately identify new candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments. Data and source codes are available at https://github.com/2020MEAI/KDGene.
Collapse
Affiliation(s)
| | - Kuo Yang
- Corresponding author: Kuo Yang and Xuezhong Zhou, Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China. E-mail: and
| | | | | | | | | | | | | | | | - Xuezhong Zhou
- Corresponding author: Kuo Yang and Xuezhong Zhou, Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China. E-mail: and
| |
Collapse
|
5
|
Zhang D, Zhao R, Xian G, Kou Y, Ma W. A new model construction based on the knowledge graph for mining elite polyphenotype genes in crops. FRONTIERS IN PLANT SCIENCE 2024; 15:1361716. [PMID: 38571713 PMCID: PMC10987776 DOI: 10.3389/fpls.2024.1361716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 03/04/2024] [Indexed: 04/05/2024]
Abstract
Identifying polyphenotype genes that simultaneously regulate important agronomic traits (e.g., plant height, yield, and disease resistance) is critical for developing novel high-quality crop varieties. Predicting the associations between genes and traits requires the organization and analysis of multi-dimensional scientific data. The existing methods for establishing the relationships between genomic data and phenotypic data can only elucidate the associations between genes and individual traits. However, there are relatively few methods for detecting elite polyphenotype genes. In this study, a knowledge graph for traits regulating-genes was constructed by collecting data from the PubMed database and eight other databases related to the staple food crops rice, maize, and wheat as well as the model plant Arabidopsis thaliana. On the basis of the knowledge graph, a model for predicting traits regulating-genes was constructed by combining the data attributes of the gene nodes and the topological relationship attributes of the gene nodes. Additionally, a scoring method for predicting the genes regulating specific traits was developed to screen for elite polyphenotype genes. A total of 125,591 nodes and 547,224 semantic relationships were included in the knowledge graph. The accuracy of the knowledge graph-based model for predicting traits regulating-genes was 0.89, the precision rate was 0.91, the recall rate was 0.96, and the F1 value was 0.94. Moreover, 4,447 polyphenotype genes for 31 trait combinations were identified, among which the rice polyphenotype gene IPA1 and the A. thaliana polyphenotype gene CUC2 were verified via a literature search. Furthermore, the wheat gene TraesCS5A02G275900 was revealed as a potential polyphenotype gene that will need to be further characterized. Meanwhile, the result of venn diagram analysis between the polyphenotype gene datasets (consists of genes that are predicted by our model) and the transcriptome gene datasets (consists of genes that were differential expression in response to disease, drought or salt) showed approximately 70% and 54% polyphenotype genes were identified in the transcriptome datasets of Arabidopsis and rice, respectively. The application of the model driven by knowledge graph for predicting traits regulating-genes represents a novel method for detecting elite polyphenotype genes.
Collapse
Affiliation(s)
- Dandan Zhang
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ruixue Zhao
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China
- Key Laboratory of Agricultural Integration Publishing Knowledge Mining and Knowledge Service, National Press and Publication Administration, Beijing, China
| | - Guojian Xian
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China
- Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing, China
| | - Yuantao Kou
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China
- Key Laboratory of Agricultural Integration Publishing Knowledge Mining and Knowledge Service, National Press and Publication Administration, Beijing, China
| | - Weilu Ma
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, China
| |
Collapse
|
6
|
Zhang Y, Chen G, Zhou S, He L, Ayanniyi OO, Xu Q, Yue Z, Yang C. APDDD: Animal parasitic diseases and drugs database. Comp Immunol Microbiol Infect Dis 2024; 104:102096. [PMID: 38000324 DOI: 10.1016/j.cimid.2023.102096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 10/26/2023] [Accepted: 11/13/2023] [Indexed: 11/26/2023]
Abstract
Animal parasitic diseases not only have an economic impact, but also have serious social and public health impacts. Although antiparasitic drugs can treat these diseases, it seems difficult for users to comprehensively utilize the information, due to incomplete and difficult data collection. Thus, there is an urgent need to establish a comprehensive database, that includes parasitic diseases and related drugs. In this paper, we develop a knowledge database dedicated to collecting and analyzing animal parasitic diseases and related drugs, named Animal Parasitic Diseases and Drugs Database (APDDD). The current version of APDDD includes animal parasitic disease data of 8 major parasite classifications that cause common parasitic diseases and 96 subclass samples mined from many literature and authoritative books, as well as 182 antiparasitic drugs. Furthermore, we utilized APDDD data to add a knowledge graph representing the relationships between parasitic diseases, drugs, and the targeted gene of drugs acting on parasites. We hope that APDDD will become a good database for animal parasitic diseases and antiparasitic drugs research and that users can gain a more intuitive understanding of the relationships between parasitic diseases, drugs, and targeted genes through the knowledge graph.
Collapse
Affiliation(s)
- Yilei Zhang
- College of Animal Science and Technology, School of Information and Computer, Anhui Agricultural University, Hefei, Anhui Province 230036, China
| | - Guojun Chen
- College of Animal Science and Technology, School of Information and Computer, Anhui Agricultural University, Hefei, Anhui Province 230036, China
| | - Siyi Zhou
- College of Animal Science and Technology, School of Information and Computer, Anhui Agricultural University, Hefei, Anhui Province 230036, China
| | - Lingru He
- College of Animal Science and Technology, School of Information and Computer, Anhui Agricultural University, Hefei, Anhui Province 230036, China
| | - Olalekan Opeyemi Ayanniyi
- College of Animal Science and Technology, School of Information and Computer, Anhui Agricultural University, Hefei, Anhui Province 230036, China
| | - Qianming Xu
- College of Animal Science and Technology, School of Information and Computer, Anhui Agricultural University, Hefei, Anhui Province 230036, China
| | - Zhenyu Yue
- College of Animal Science and Technology, School of Information and Computer, Anhui Agricultural University, Hefei, Anhui Province 230036, China.
| | - Congshan Yang
- College of Animal Science and Technology, School of Information and Computer, Anhui Agricultural University, Hefei, Anhui Province 230036, China; Anhui Province Key Laboratory of Veterinary Pathobiology and Disease Control, College of Animal Science and Technology, Anhui Agricultural University, Hefei 230036, China.
| |
Collapse
|
7
|
Daza D, Alivanistos D, Mitra P, Pijnenburg T, Cochez M, Groth P. BioBLP: a modular framework for learning on multimodal biomedical knowledge graphs. J Biomed Semantics 2023; 14:20. [PMID: 38066573 PMCID: PMC10709903 DOI: 10.1186/s13326-023-00301-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. OBJECTIVE We aim to understand how to incorporate multimodal data into biomedical KG embeddings, and analyze the resulting performance in comparison with traditional methods. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. RESULTS In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. Further analyses show that incorporating attribute data does outperform baselines over entities below a certain node degree, comprising approximately 75% of the diseases in the graph. We also observe that optimizing attribute encoders is a challenging task that increases optimization costs. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. CONCLUSION BioBLP allows to investigate different ways of incorporating multimodal biomedical data for learning representations in KGs. With a particular implementation, we find that incorporating attribute data does not consistently outperform baselines, but improvements are obtained on a comparatively large subset of entities below a specific node-degree. Our results indicate a potential for improved performance in scientific discovery tasks where understudied areas of the KG would benefit from link prediction methods.
Collapse
Affiliation(s)
- Daniel Daza
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
- University of Amsterdam, Amsterdam, The Netherlands.
- Discovery Lab, Elsevier, Amsterdam, The Netherlands.
| | - Dimitrios Alivanistos
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
- Discovery Lab, Elsevier, Amsterdam, The Netherlands.
| | | | | | - Michael Cochez
- Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| | - Paul Groth
- University of Amsterdam, Amsterdam, The Netherlands
- Discovery Lab, Elsevier, Amsterdam, The Netherlands
| |
Collapse
|
8
|
Ratajczak F, Joblin M, Hildebrandt M, Ringsquandl M, Falter-Braun P, Heinig M. Speos: an ensemble graph representation learning framework to predict core gene candidates for complex diseases. Nat Commun 2023; 14:7206. [PMID: 37938585 PMCID: PMC10632370 DOI: 10.1038/s41467-023-42975-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/27/2023] [Indexed: 11/09/2023] Open
Abstract
Understanding phenotype-to-genotype relationships is a grand challenge of 21st century biology with translational implications. The recently proposed "omnigenic" model postulates that effects of genetic variation on traits are mediated by core-genes and -proteins whose activities mechanistically influence the phenotype, whereas peripheral genes encode a regulatory network that indirectly affects phenotypes via core gene products. Here, we develop a positive-unlabeled graph representation-learning ensemble-approach based on a nested cross-validation to predict core-like genes for diverse diseases using Mendelian disorder genes for training. Employing mouse knockout phenotypes for external validations, we demonstrate that core-like genes display several key properties of core genes: Mouse knockouts of genes corresponding to our most confident predictions give rise to relevant mouse phenotypes at rates on par with the Mendelian disorder genes, and all candidates exhibit core gene properties like transcriptional deregulation in disease and loss-of-function intolerance. Moreover, as predicted for core genes, our candidates are enriched for drug targets and druggable proteins. In contrast to Mendelian disorder genes the new core-like genes are enriched for druggable yet untargeted gene products, which are therefore attractive targets for drug development. Interpretation of the underlying deep learning model suggests plausible explanations for our core gene predictions in form of molecular mechanisms and physical interactions. Our results demonstrate the potential of graph representation learning for the interpretation of biological complexity and pave the way for studying core gene properties and future drug development.
Collapse
Affiliation(s)
- Florin Ratajczak
- Institute of Network Biology (INET), Molecular Targets and Therapeutics Center (MTTC), Helmholtz Munich, Neuherberg, Germany
| | | | | | | | - Pascal Falter-Braun
- Institute of Network Biology (INET), Molecular Targets and Therapeutics Center (MTTC), Helmholtz Munich, Neuherberg, Germany.
- Microbe-Host Interactions, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany.
| | - Matthias Heinig
- Institute of Computational Biology (ICB), Helmholtz Munich, Neuherberg, Germany.
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- German Centre for Cardiovascular Research (DZHK), Munich Heart Association, Partner Site Munich, Berlin, Germany.
| |
Collapse
|
9
|
Lee H, Jeon J, Jung D, Won JI, Kim K, Kim YJ, Yoon J. RelCurator: a text mining-based curation system for extracting gene-phenotype relationships specific to neurodegenerative disorders. Genes Genomics 2023; 45:1025-1036. [PMID: 37300788 DOI: 10.1007/s13258-023-01405-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 05/18/2023] [Indexed: 06/12/2023]
Abstract
BACKGROUND The identification of gene-phenotype relationships is important in medical genetics as it serves as a basis for precision medicine. However, most of the gene-phenotype relationship data are buried in the biomedical literature in textual form. OBJECTIVE We propose RelCurator, a curation system that extracts sentences including both gene and phenotype entities related to specific disease categories from PubMed articles, provides rich additional information such as entity taggings, and predictions of gene-phenotype relationships. METHODS We targeted neurodegenerative disorders and developed a deep learning model using Bidirectional Gated Recurrent Unit (BiGRU) networks and BioWordVec word embeddings for predicting gene-phenotype relationships from biomedical texts. The prediction model is trained with more than 130,000 labeled PubMed sentences including gene and phenotype entities, which are related to or unrelated to neurodegenerative disorders. RESULTS We compared the performance of our deep learning model with those of Bidirectional Encoder Representations from Transformers (BERT), Support Vector Machine (SVM), and simple Recurrent Neural Network (simple RNN) models. Our model performed better with an F1-score of 0.96. Furthermore, the evaluation done using a few curation cases in the real scenario showed the effectiveness of our work. Therefore, we conclude that RelCurator can identify not only new causative genes, but also new genes associated with neurodegenerative disorders' phenotype. CONCLUSION RelCurator is a user-friendly method for accessing deep learning-based supporting information and a concise web interface to assist curators while browsing the PubMed articles. Our curation process represents an important and broadly applicable improvement to the state of the art for the curation of gene-phenotype relationships.
Collapse
Affiliation(s)
- Heonwoo Lee
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea
| | - Junbeom Jeon
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea
| | - Dawoon Jung
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea
| | - Jung-Im Won
- Center for Innovation in Engineering Education, Hanyang University, Seoul, Republic of Korea
| | - Kiyong Kim
- Department of Electronic Engineering, Kyonggi University, Suwon, Republic of Korea
| | - Yun Joong Kim
- Department of Neurology, Yonsei University College of Medicine, Seoul, Republic of Korea.
- Department of Neurology, Yongin Severance Hospital, Yonsei University College of Medicine, Yonsei University Health System, Yongin, Gyeonggi-do, 16995, Republic of Korea.
| | - Jeehee Yoon
- Department of Computer Engineering, Hallym University, Chuncheon, Gangwon-do, 200- 702, Republic of Korea.
| |
Collapse
|
10
|
Abu-Salih B, AL-Qurishi M, Alweshah M, AL-Smadi M, Alfayez R, Saadeh H. Healthcare knowledge graph construction: A systematic review of the state-of-the-art, open issues, and opportunities. JOURNAL OF BIG DATA 2023; 10:81. [PMID: 37274445 PMCID: PMC10225120 DOI: 10.1186/s40537-023-00774-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 05/17/2023] [Indexed: 06/06/2023]
Abstract
The incorporation of data analytics in the healthcare industry has made significant progress, driven by the demand for efficient and effective big data analytics solutions. Knowledge graphs (KGs) have proven utility in this arena and are rooted in a number of healthcare applications to furnish better data representation and knowledge inference. However, in conjunction with a lack of a representative KG construction taxonomy, several existing approaches in this designated domain are inadequate and inferior. This paper is the first to provide a comprehensive taxonomy and a bird's eye view of healthcare KG construction. Additionally, a thorough examination of the current state-of-the-art techniques drawn from academic works relevant to various healthcare contexts is carried out. These techniques are critically evaluated in terms of methods used for knowledge extraction, types of the knowledge base and sources, and the incorporated evaluation protocols. Finally, several research findings and existing issues in the literature are reported and discussed, opening horizons for future research in this vibrant area.
Collapse
Affiliation(s)
| | | | | | - Mohammad AL-Smadi
- Jordan University of Science and Technology, Irbid, Jordan
- Qatar University, Doha, Qatar
| | | | | |
Collapse
|
11
|
Mangione W, Falls Z, Samudrala R. Effective holistic characterization of small molecule effects using heterogeneous biological networks. Front Pharmacol 2023; 14:1113007. [PMID: 37180722 PMCID: PMC10169664 DOI: 10.3389/fphar.2023.1113007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 04/11/2023] [Indexed: 05/16/2023] Open
Abstract
The two most common reasons for attrition in therapeutic clinical trials are efficacy and safety. We integrated heterogeneous data to create a human interactome network to comprehensively describe drug behavior in biological systems, with the goal of accurate therapeutic candidate generation. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multiscale therapeutic discovery, repurposing, and design was enhanced by integrating drug side effects, protein pathways, protein-protein interactions, protein-disease associations, and the Gene Ontology, and complemented with its existing drug/compound, protein, and indication libraries. These integrated networks were reduced to a "multiscale interactomic signature" for each compound that describe its functional behavior as vectors of real values. These signatures are then used for relating compounds to each other with the hypothesis that similar signatures yield similar behavior. Our results indicated that there is significant biological information captured within our networks (particularly via side effects) which enhance the performance of our platform, as evaluated by performing all-against-all leave-one-out drug-indication association benchmarking as well as generating novel drug candidates for colon cancer and migraine disorders corroborated via literature search. Further, drug impacts on pathways derived from computed compound-protein interaction scores served as the features for a random forest machine learning model trained to predict drug-indication associations, with applications to mental disorders and cancer metastasis highlighted. This interactomic pipeline highlights the ability of Computational Analysis of Novel Drug Opportunities to accurately relate drugs in a multitarget and multiscale context, particularly for generating putative drug candidates using the information gleaned from indirect data such as side effect profiles and protein pathway information.
Collapse
Affiliation(s)
| | | | - Ram Samudrala
- Jacobs School of Medicine and Biomedical Sciences, Department of Biomedical Informatics, University at Buffalo, Buffalo, NY, United States
| |
Collapse
|
12
|
Wang H, Wang X, Liu W, Xie X, Peng S. deepDGA: Biomedical Heterogeneous Network-based Deep Learning Framework for Disease-Gene Association Predictions. 2022 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) 2022:601-606. [DOI: 10.1109/bibm55620.2022.9995651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Affiliation(s)
- Hong Wang
- Hunan University,College of Computer Science and Electronic Engineering,Changsha,China
| | - Xiaoqi Wang
- Hunan University,College of Computer Science and Electronic Engineering,Changsha,China
| | - Wenjuan Liu
- Hunan University,College of Computer Science and Electronic Engineering,Changsha,China
| | - Xiaolan Xie
- Guilin University of Technology,College of Information Science and Engineering,Guilin,China
| | - Shaoliang Peng
- Hunan University,College of Computer Science and Electronic Engineering,Changsha,China
| |
Collapse
|
13
|
A Knowledge Graph Completion Method Applied to Literature-Based Discovery for Predicting Missing Links Targeting Cancer Drug Repurposing. Artif Intell Med 2022. [DOI: 10.1007/978-3-031-09342-5_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|