1
|
Wang K, Miao Y, Wang X, Li Y, Li F, Song H. Research on the construction of a knowledge graph for tomato leaf pests and diseases based on the named entity recognition model. FRONTIERS IN PLANT SCIENCE 2024; 15:1482275. [PMID: 39574459 PMCID: PMC11578693 DOI: 10.3389/fpls.2024.1482275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Accepted: 10/22/2024] [Indexed: 11/24/2024]
Abstract
Introduction Tomato leaf pests and diseases pose a significant threat to the yield and quality of Q6 tomatoes, highlighting the necessity for comprehensive studies on effective control methods. Methods Current control measures predominantly rely on experience and manual observation, hindering the integration of multi-source data. To address this, we integrated information resources related to tomato leaf pests and diseases from agricultural standards documents, knowledge websites, and relevant literature. Guided by domain experts, we preprocessed this data to construct a sample set. Results We utilized the Named Entity Recognition (NER) model ALBERT-BiLSTM-CRF to conduct end-to-end knowledge extraction experiments, which outperformed traditional models such as 1DCNN-CRF and BiLSTM-CRF, achieving a recall rate of 95.03%. The extracted knowledge was then stored in the Neo4j graph database, effectively visualizing the internal structure of the knowledge graph. Discussion We developed a digital diagnostic system for tomato leaf pests and diseases based on the knowledge graph, enabling graphical management and visualization of pest and disease knowledge. The constructed knowledge graph offers insights for controlling tomato leaf pests and diseases and provides new research directions for pest control in other crops.
Collapse
Affiliation(s)
- Kun Wang
- Software College, Shanxi Agricultural University, Jinzhong, Shanxi, China
| | - Yuyuan Miao
- Software College, Shanxi Agricultural University, Jinzhong, Shanxi, China
| | - Xu Wang
- Software College, Shanxi Agricultural University, Jinzhong, Shanxi, China
| | - Yuze Li
- Software College, Shanxi Agricultural University, Jinzhong, Shanxi, China
| | - Fuzhong Li
- Software College, Shanxi Agricultural University, Jinzhong, Shanxi, China
| | - Haiyan Song
- Agricultural Engineering College, Shanxi Agricultural University, Jinzhong, Shanxi, China
| |
Collapse
|
2
|
Xu T, Li B, Chen L, Yang C, Gu Y, Gu X. EHR coding with hybrid attention and features propagation on disease knowledge graph. Artif Intell Med 2024; 154:102916. [PMID: 38909432 DOI: 10.1016/j.artmed.2024.102916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 04/08/2024] [Accepted: 06/17/2024] [Indexed: 06/25/2024]
Abstract
And sentences associated with these attributes and relationships have been neglected. in this paper ►We propose an end-to-end model called Knowledge Graph Enhanced neural network (KGENet) to address the above shortcomings. specifically ►We first construct a disease knowledge graph that focuses on the multi-view disease attributes of ICD codes and the disease relationships between these codes. we also use a long sequence encoder to get EHR document representation. most importantly ►KGENet leverages multi-view disease attributes and structured disease relationships for knowledge enhancement through hybrid attention and graph propagation ►Respectively. furthermore ►The above processes can provide attribute-aware and relationship-augmented explainability for the model prediction results based on our disease knowledge graph. experiments conducted on the MIMIC-III benchmark dataset show that KGENet outperforms state-of-the-art models in both model effectiveness and explainability Electronic health record (EHR) coding assigns International Classification of Diseases (ICD) codes to each EHR document. These standard medical codes represent diagnoses or procedures and play a critical role in medical applications. However, EHR is a long medical text that is difficult to represent, the ICD code label space is large, and the labels have an extremely unbalanced distribution. These factors pose challenges to automatic EHR coding. Previous studies have not explored the disease attributes (e.g., symptoms, tests, medications) of ICD codes and the disease relationships (e.g., causes, risk factors, comorbidities) between them. In addition, the important roles of medical.
Collapse
Affiliation(s)
- Tianhan Xu
- School of Information Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China; Jiangsu Province Engineering Research Center of Knowledge Management and Intelligent Service, Yangzhou, 225127, Jiangsu, China
| | - Bin Li
- School of Information Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China; Jiangsu Province Engineering Research Center of Knowledge Management and Intelligent Service, Yangzhou, 225127, Jiangsu, China.
| | - Ling Chen
- School of Information Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China; Jiangsu Province Engineering Research Center of Knowledge Management and Intelligent Service, Yangzhou, 225127, Jiangsu, China
| | - Chao Yang
- School of Information Engineering, Yangzhou University, Yangzhou, 225127, Jiangsu, China; Jiangsu Province Engineering Research Center of Knowledge Management and Intelligent Service, Yangzhou, 225127, Jiangsu, China
| | - Yixun Gu
- Department of Oncology, Northern Jiangsu Province People Hospital of Yangzhou University, Yangzhou, 225001, Jiangsu, China
| | - Xiang Gu
- Department of Cardiovascular, Northern Jiangsu Province People Hospital of Yangzhou University, Yangzhou, 225001, Jiangsu, China
| |
Collapse
|
3
|
Walke D, Micheel D, Schallert K, Muth T, Broneske D, Saake G, Heyer R. The importance of graph databases and graph learning for clinical applications. Database (Oxford) 2023; 2023:baad045. [PMID: 37428679 PMCID: PMC10332447 DOI: 10.1093/database/baad045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 05/26/2023] [Accepted: 06/16/2023] [Indexed: 07/12/2023]
Abstract
The increasing amount and complexity of clinical data require an appropriate way of storing and analyzing those data. Traditional approaches use a tabular structure (relational databases) for storing data and thereby complicate storing and retrieving interlinked data from the clinical domain. Graph databases provide a great solution for this by storing data in a graph as nodes (vertices) that are connected by edges (links). The underlying graph structure can be used for the subsequent data analysis (graph learning). Graph learning consists of two parts: graph representation learning and graph analytics. Graph representation learning aims to reduce high-dimensional input graphs to low-dimensional representations. Then, graph analytics uses the obtained representations for analytical tasks like visualization, classification, link prediction and clustering which can be used to solve domain-specific problems. In this survey, we review current state-of-the-art graph database management systems, graph learning algorithms and a variety of graph applications in the clinical domain. Furthermore, we provide a comprehensive use case for a clearer understanding of complex graph learning algorithms. Graphical abstract.
Collapse
Affiliation(s)
- Daniel Walke
- Bioprocess Engineering, Otto von Guericke University, Universitätsplatz 2, Magdeburg 39106, Germany
- Database and Software Engineering Group, Otto von Guericke University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Daniel Micheel
- Database and Software Engineering Group, Otto von Guericke University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Kay Schallert
- Multidimensional Omics Analyses Group, Leibniz-Institut für Analytische Wissenschaften—ISAS—e.V., Bunsen-Kirchhoff-Straße 11, Dortmund 44139, Germany
| | - Thilo Muth
- Section eScience (S.3), Federal Institute for Materials Research and Testing (BAM), Unter den Eichen 87, Berlin 12205, Germany
| | - David Broneske
- Infrastructure and Methods, German Center for Higher Education Research and Science Studies (DZHW), Lange Laube 12, Hannover 30159, Germany
| | - Gunter Saake
- Database and Software Engineering Group, Otto von Guericke University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Robert Heyer
- Multidimensional Omics Analyses Group, Leibniz-Institut für Analytische Wissenschaften—ISAS—e.V., Bunsen-Kirchhoff-Straße 11, Dortmund 44139, Germany
- Faculty of Technology, Bielefeld University, Universitätsstraße 25, Bielefeld 33615, Germany
| |
Collapse
|
4
|
Murali L, Gopakumar G, Viswanathan DM, Nedungadi P. Towards electronic health record-based medical knowledge graph construction, completion, and applications: A literature study. J Biomed Inform 2023:104403. [PMID: 37230406 DOI: 10.1016/j.jbi.2023.104403] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/16/2023] [Accepted: 05/19/2023] [Indexed: 05/27/2023]
Abstract
With the growth of data and intelligent technologies, the healthcare sector opened numerous technology that enabled services for patients, clinicians, and researchers. One major hurdle in achieving state-of-the-art results in health informatics is domain-specific terminologies and their semantic complexities. A knowledge graph crafted from medical concepts, events, and relationships acts as a medical semantic network to extract new links and hidden patterns from health data sources. Current medical knowledge graph construction studies are limited to generic techniques and opportunities and focus less on exploiting real-world data sources in knowledge graph construction. A knowledge graph constructed from Electronic Health Records (EHR) data obtains real-world data from healthcare records. It ensures better results in subsequent tasks like knowledge extraction and inference, knowledge graph completion, and medical knowledge graph applications such as diagnosis predictions, clinical recommendations, and clinical decision support. This review critically analyses existing works on medical knowledge graphs that used EHR data as the data source at (i) representation level, (ii) extraction level (iii) completion level. In this investigation, we found that EHR-based knowledge graph construction involves challenges such as high complexity and dimensionality of data, lack of knowledge fusion, and dynamic update of the knowledge graph. In addition, the study presents possible ways to tackle the challenges identified. Our findings conclude that future research should focus on knowledge graph integration and knowledge graph completion challenges.
Collapse
Affiliation(s)
- Lino Murali
- Center for Research in Analytics and Technologies for Education (CREATE), Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India; Division of Information technology, School of Engineering, Cochin University of Science and Technology, Kochi, 682022, Kerala, India
| | - G Gopakumar
- Department of Computer Science and Engineering, School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India
| | - Daleesha M Viswanathan
- Division of Information technology, School of Engineering, Cochin University of Science and Technology, Kochi, 682022, Kerala, India
| | - Prema Nedungadi
- Center for Research in Analytics and Technologies for Education (CREATE), Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India; Department of Computer Science and Engineering, School of Computing, Amrita Vishwa Vidyapeetham, Amritapuri, Kollam, 690525, Kerala, India.
| |
Collapse
|
5
|
LPG-Based Knowledge Graphs: A Survey, a Proposal and Current Trends. INFORMATION 2023. [DOI: 10.3390/info14030154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Abstract
A significant part of the current research in the field of Artificial Intelligence is devoted to knowledge bases. New techniques and methodologies are emerging every day for the storage, maintenance and reasoning over knowledge bases. Recently, the most common way of representing knowledge bases is by means of graph structures. More specifically, according to the Semantic Web perspective, many knowledge sources are in the form of a graph adopting the Resource Description Framework model. At the same time, graphs have also started to gain momentum as a model for databases. Graph DBMSs, such as Neo4j, adopt the Labeled Property Graph model. Many works tried to merge these two perspectives. In this paper, we will overview different proposals aimed at combining these two aspects, especially focusing on possibility for them to add reasoning capabilities. In doing this, we will show current trends, issues and possible solutions. In this context, we will describe our proposal and its novelties with respect to the current state of the art, highlighting its current status, potential, the methodology, and our prospect.
Collapse
|
6
|
Zhao G, Gu W, Cai W, Zhao Z, Zhang X, Liu J. MLEE: A method for extracting object-level medical knowledge graph entities from Chinese clinical records. Front Genet 2022; 13:900242. [PMID: 35938002 PMCID: PMC9354090 DOI: 10.3389/fgene.2022.900242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 06/16/2022] [Indexed: 11/13/2022] Open
Abstract
As a typical knowledge-intensive industry, the medical field uses knowledge graph technology to construct causal inference calculations, such as “symptom-disease”, “laboratory examination/imaging examination-disease”, and “disease-treatment method”. The continuous expansion of large electronic clinical records provides an opportunity to learn medical knowledge by machine learning. In this process, how to extract entities with a medical logic structure and how to make entity extraction more consistent with the logic of the text content in electronic clinical records are two issues that have become key in building a high-quality, medical knowledge graph. In this work, we describe a method for extracting medical entities using real Chinese clinical electronic clinical records. We define a computational architecture named MLEE to extract object-level entities with “object-attribute” dependencies. We conducted experiments based on randomly selected electronic clinical records of 1,000 patients from Shengjing Hospital of China Medical University to verify the effectiveness of the method.
Collapse
Affiliation(s)
- Genghong Zhao
- School of Computer Science and Engineering Northeastern University, Shenyang, China
- Neusoft Research of Intelligent Healthcare Technology, Shenyang, China
- *Correspondence: Genghong Zhao, ; Xia Zhang, ; Jiren Liu,
| | - Wenjian Gu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Wei Cai
- Neusoft Research of Intelligent Healthcare Technology, Shenyang, China
| | - Zhiying Zhao
- Department of Clinical Epidemiology, Shengjing Hospital of China Medical University, Shenyang, China
| | - Xia Zhang
- School of Computer Science and Engineering Northeastern University, Shenyang, China
- Neusoft Research of Intelligent Healthcare Technology, Shenyang, China
- *Correspondence: Genghong Zhao, ; Xia Zhang, ; Jiren Liu,
| | - Jiren Liu
- School of Computer Science and Engineering Northeastern University, Shenyang, China
- Neusoft Corporation, Shenyang, China
- *Correspondence: Genghong Zhao, ; Xia Zhang, ; Jiren Liu,
| |
Collapse
|
7
|
Zhang Z, Yu P, Pai N, Chang HCR, Chen S, Yin M, Song T, Lau SK, Deng C. Developing an Intuitive Graph Representation of Knowledge for Nonpharmacological Treatment of Psychotic Symptoms in Dementia. J Gerontol Nurs 2022; 48:49-55. [PMID: 35343842 DOI: 10.3928/00989134-20220308-02] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Applying person-centered, nonpharmacological interventions to manage psychotic symptoms of dementia is promoted for health care professionals, particularly gerontological nurses, who are responsible for care of older adults in nursing homes. A knowledge graph is a graph consisting of a set of concepts that are linked together by their interrelationship and has been widely used as a formal representation of domain knowledge in health. However, there is lack of a knowledge graph for nonpharmacological treatment of psychotic symptoms in dementia. Therefore, we developed a comprehensive, human- and machine-understandable knowledge graph for this domain, named Dementia-Related Psychotic Symptom Nonpharmacological Treatment Ontology (DRPSNPTO). This graph was built by adopting the established NeOn methodology, a knowledge graph engineering method, to meet the quality standards for biomedical knowledge graphs. This intuitive graph representation of the domain knowledge sets a new direction for visualizing and computerizing gerontological knowledge to facilitate human comprehension and build intelligent aged care information systems. [Journal of Gerontological Nursing, 48(4), 49-55.].
Collapse
|
8
|
Lan G, Liu T, Wang X, Pan X, Huang Z. A semantic web technology index. Sci Rep 2022; 12:3672. [PMID: 35256665 PMCID: PMC8901930 DOI: 10.1038/s41598-022-07615-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 01/13/2022] [Indexed: 01/22/2023] Open
Abstract
Semantic web (SW) technology has been widely applied to many domains such as medicine, health care, finance, geology. At present, researchers mainly rely on their experience and preferences to develop and evaluate the work of SW technology. Although the general architecture (e.g., Tim Berners-Lee’s Semantic Web Layer Cake) of SW technology was proposed many years ago and has been well-known, it still lacks a concrete guideline for standardizing the development of SW technology. In this paper, we propose an SW technology index to standardize the development for ensuring that the work of SW technology is designed well and to quantitatively evaluate the quality of the work in SW technology. This index consists of 10 criteria that quantify the quality as a score of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$0{-}10$$\end{document}0-10. We address each criterion in detail for a clear explanation from three aspects: (1) what is the criterion? (2) why do we consider this criterion and (3) how do the current studies meet this criterion? Finally, we present the validation of this index by providing some examples of how to apply the index to the validation cases. We conclude that the index is a useful standard to guide and evaluate the work in SW technology.
Collapse
|
9
|
Ayala D, Hernández I, Ruiz D, Rahm E. LEAPME: Learning-based Property Matching with Embeddings. DATA KNOWL ENG 2022. [DOI: 10.1016/j.datak.2021.101943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
10
|
Leveraging graph-based hierarchical medical entity embedding for healthcare applications. Sci Rep 2021; 11:5858. [PMID: 33712670 PMCID: PMC7955058 DOI: 10.1038/s41598-021-85255-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 02/23/2021] [Indexed: 12/30/2022] Open
Abstract
Automatic representation learning of key entities in electronic health record (EHR) data is a critical step for healthcare data mining that turns heterogeneous medical records into structured and actionable information. Here we propose ME2Vec, an algorithmic framework for learning continuous low-dimensional embedding vectors of the most common entities in EHR: medical services, doctors, and patients. ME2Vec features a hierarchical structure that encapsulates different node embedding schemes to cater for the unique characteristic of each medical entity. To embed medical services, we employ a biased-random-walk-based node embedding that leverages the irregular time intervals of medical services in EHR to embody their relative importance. To embed doctors and patients, we adhere to the principle “it’s what you do that defines you” and derive their embeddings based on their interactions with other types of entities through graph neural network and proximity-preserving network embedding, respectively. Using real-world clinical data, we demonstrate the efficacy of ME2Vec over competitive baselines on diagnosis prediction, readmission prediction, as well as recommending doctors to patients based on their medical conditions. In addition, medical service embeddings pretrained using ME2Vec can substantially improve the performance of sequential models in predicting patients clinical outcomes. Overall, ME2Vec can serve as a general-purpose representation learning algorithm for EHR data and benefit various downstream tasks in terms of both performance and interpretability.
Collapse
|
11
|
Franklin JDS, Chari S, Foreman MA, Seneviratne O, Gruen DM, McCusker JP, Das AK, McGuinness DL. Knowledge Extraction of Cohort Characteristics in Research Publications. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2021; 2020:462-471. [PMID: 33936419 PMCID: PMC8075436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
When healthcare providers review the results of a clinical trial study to understand its applicability to their practice, they typically analyze how well the characteristics of the study cohort correspond to those of the patients they see. We have previously created a study cohort ontology to standardize this information and make it accessible for knowledge-based decision support. The extraction of this information from research publications is challenging, however, given the wide variance in reporting cohort characteristics in a tabular representation. To address this issue, we have developed an ontology-enabled knowledge extraction pipeline for automatically constructing knowledge graphs from the cohort characteristics found in PDF-formatted research papers. We evaluated our approach using a training and test set of 41 research publications and found an overall accuracy of 83.3% in correctly assembling the knowledge graphs. Our research provides a promising approach for extracting knowledge more broadly from tabular information in research publications.
Collapse
|
12
|
Cernile G, Heritage T, Sebire NJ, Gordon B, Schwering T, Kazemlou S, Borecki Y. Network graph representation of COVID-19 scientific publications to aid knowledge discovery. BMJ Health Care Inform 2021; 28:bmjhci-2020-100254. [PMID: 33419870 PMCID: PMC7798427 DOI: 10.1136/bmjhci-2020-100254] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 12/01/2020] [Accepted: 12/11/2020] [Indexed: 12/11/2022] Open
Abstract
Introduction Numerous scientific journal articles related to COVID-19 have been rapidly published, making navigation and understanding of relationships difficult. Methods A graph network was constructed from the publicly available COVID-19 Open Research Dataset (CORD-19) of COVID-19-related publications using an engine leveraging medical knowledge bases to identify discrete medical concepts and an open-source tool (Gephi) to visualise the network. Results The network shows connections between diseases, medications and procedures identified from the title and abstract of 195 958 COVID-19-related publications (CORD-19 Dataset). Connections between terms with few publications, those unconnected to the main network and those irrelevant were not displayed. Nodes were coloured by knowledge base and the size of the node related to the number of publications containing the term. The data set and visualisations were made publicly accessible via a webtool. Conclusion Knowledge management approaches (text mining and graph networks) can effectively allow rapid navigation and exploration of entity inter-relationships to improve understanding of diseases such as COVID-19.
Collapse
|
13
|
Rossanez A, Dos Reis JC, Torres RDS, de Ribaupierre H. KGen: a knowledge graph generator from biomedical scientific literature. BMC Med Inform Decis Mak 2020; 20:314. [PMID: 33317512 PMCID: PMC7734730 DOI: 10.1186/s12911-020-01341-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 11/17/2020] [Indexed: 11/26/2022] Open
Abstract
Background Knowledge is often produced from data generated in scientific investigations. An ever-growing number of scientific studies in several domains result into a massive amount of data, from which obtaining new knowledge requires computational help. For example, Alzheimer’s Disease, a life-threatening degenerative disease that is not yet curable. As the scientific community strives to better understand it and find a cure, great amounts of data have been generated, and new knowledge can be produced. A proper representation of such knowledge brings great benefits to researchers, to the scientific community, and consequently, to society. Methods In this article, we study and evaluate a semi-automatic method that generates knowledge graphs (KGs) from biomedical texts in the scientific literature. Our solution explores natural language processing techniques with the aim of extracting and representing scientific literature knowledge encoded in KGs. Our method links entities and relations represented in KGs to concepts from existing biomedical ontologies available on the Web. We demonstrate the effectiveness of our method by generating KGs from unstructured texts obtained from a set of abstracts taken from scientific papers on the Alzheimer’s Disease. We involve physicians to compare our extracted triples from their manual extraction via their analysis of the abstracts. The evaluation further concerned a qualitative analysis by the physicians of the generated KGs with our software tool. Results The experimental results indicate the quality of the generated KGs. The proposed method extracts a great amount of triples, showing the effectiveness of our rule-based method employed in the identification of relations in texts. In addition, ontology links are successfully obtained, which demonstrates the effectiveness of the ontology linking method proposed in this investigation. Conclusions We demonstrate that our proposal is effective on building ontology-linked KGs representing the knowledge obtained from biomedical scientific texts. Such representation can add value to the research in various domains, enabling researchers to compare the occurrence of concepts from different studies. The KGs generated may pave the way to potential proposal of new theories based on data analysis to advance the state of the art in their research domains.
Collapse
Affiliation(s)
- Anderson Rossanez
- Institute of Computing, University of Campinas, Campinas, SP, Brazil.
| | | | - Ricardo da Silva Torres
- Department of ICT and Natural Sciences, Faculty of Information Technology and Electrical Engineering, NTNU - Norwegian University of Science and Technology, Ålesund, Norway
| | | |
Collapse
|
14
|
Liu T, Pan X, Wang X, Feenstra KA, Heringa J, Huang Z. Predicting the relationships between gut microbiota and mental disorders with knowledge graphs. Health Inf Sci Syst 2020; 9:3. [PMID: 33262885 PMCID: PMC7686388 DOI: 10.1007/s13755-020-00128-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 09/30/2020] [Indexed: 01/14/2023] Open
Abstract
Gut microbiota produce and modulate the production of neurotransmitters which have been implicated in mental disorders. Neurotransmitters may act as ‘matchmaker’ between gut microbiota imbalance and mental disorders. Most of the relevant research effort goes into the relationship between gut microbiota and neurotransmitters and the other between neurotransmitters and mental disorders, while few studies collect and analyze the dispersed research results in systematic ways. We therefore gather the dispersed results that in the existing studies into a structured knowledge base for identifying and predicting the potential relationships between gut microbiota and mental disorders. In this study, we propose to construct a gut microbiota knowledge graph for mental disorder, which named as MiKG4MD. It is extendable by linking to future ontologies by just adding new relationships between existing information and new entities. This extendibility is emphasized for the integration with existing popular ontologies/terminologies, e.g. UMLS, MeSH, and KEGG. We demonstrate the performance of MiKG4MD with three SPARQL query test cases. Results show that the MiKG4MD knowledge graph is an effective method to predict the relationships between gut microbiota and mental disorders.
Collapse
Affiliation(s)
- Ting Liu
- Knowledge Representation and Reasoning (KR&R) Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.,Center for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Xueli Pan
- Knowledge Representation and Reasoning (KR&R) Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Xu Wang
- Knowledge Representation and Reasoning (KR&R) Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - K Anton Feenstra
- Center for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Jaap Heringa
- Center for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Zhisheng Huang
- Knowledge Representation and Reasoning (KR&R) Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.,Brain Protection Innovation Center, Capital Medical University, Beijing, China
| |
Collapse
|
15
|
Sun H, Xiao J, Zhu W, He Y, Zhang S, Xu X, Hou L, Li J, Ni Y, Xie G. Medical Knowledge Graph to Enhance Fraud, Waste, and Abuse Detection on Claim Data: Model Development and Performance Evaluation. JMIR Med Inform 2020; 8:e17653. [PMID: 32706714 PMCID: PMC7413281 DOI: 10.2196/17653] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 04/13/2020] [Accepted: 05/28/2020] [Indexed: 12/11/2022] Open
Abstract
Background Fraud, Waste, and Abuse (FWA) detection is a significant yet challenging problem in the health insurance industry. An essential step in FWA detection is to check whether the medication is clinically reasonable with respect to the diagnosis. Currently, human experts with sufficient medical knowledge are required to perform this task. To reduce the cost, insurance inspectors tend to build an intelligent system to detect suspicious claims with inappropriate diagnoses/medications automatically. Objective The aim of this study was to develop an automated method for making use of a medical knowledge graph to identify clinically suspected claims for FWA detection. Methods First, we identified the medical knowledge that is required to assess the clinical rationality of the claims. We then searched for data sources that contain information to build such knowledge. In this study, we focused on Chinese medical knowledge. Second, we constructed a medical knowledge graph using unstructured knowledge. We used a deep learning–based method to extract the entities and relationships from the knowledge sources and developed a multilevel similarity matching approach to conduct the entity linking. To guarantee the quality of the medical knowledge graph, we involved human experts to review the entity and relationships with lower confidence. These reviewed results could be used to further improve the machine-learning models. Finally, we developed the rules to identify the suspected claims by reasoning according to the medical knowledge graph. Results We collected 185,796 drug labels from the China Food and Drug Administration, 3390 types of disease information from medical textbooks (eg, symptoms, diagnosis, treatment, and prognosis), and information from 5272 examinations as the knowledge sources. The final medical knowledge graph includes 1,616,549 nodes and 5,963,444 edges. We designed three knowledge graph reasoning rules to identify three kinds of inappropriate diagnosis/medications. The experimental results showed that the medical knowledge graph helps to detect 70% of the suspected claims. Conclusions The medical knowledge graph–based method successfully identified suspected cases of FWA (such as fraud diagnosis, excess prescription, and irrational prescription) from the claim documents, which helped to improve the efficiency of claim processing.
Collapse
Affiliation(s)
- Haixia Sun
- Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Jin Xiao
- PingAn Health Technology, Shenzhen, China
| | - Wei Zhu
- PingAn Health Technology, Shenzhen, China
| | - Yilong He
- PingAn Health Technology, Shenzhen, China
| | | | - Xiaowei Xu
- Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Li Hou
- Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Jiao Li
- Institute of Medical Information & Library, Chinese Academy of Medical Sciences, Peking Union Medical College, Beijing, China
| | - Yuan Ni
- PingAn Health Technology, Shenzhen, China
| | | |
Collapse
|
16
|
Li N, Yang Z, Luo L, Wang L, Zhang Y, Lin H, Wang J. KGHC: a knowledge graph for hepatocellular carcinoma. BMC Med Inform Decis Mak 2020; 20:135. [PMID: 32646496 PMCID: PMC7346328 DOI: 10.1186/s12911-020-1112-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Hepatocellular carcinoma is one of the most general malignant neoplasms in adults with high mortality. Mining relative medical knowledge from rapidly growing text data and integrating it with other existing biomedical resources will provide support to the research on the hepatocellular carcinoma. To this purpose, we constructed a knowledge graph for Hepatocellular Carcinoma (KGHC). METHODS We propose an approach to build a knowledge graph for hepatocellular carcinoma. Specifically, we first extracted knowledge from structured data and unstructured data. Since the extracted entities may contain some noise, we applied a biomedical information extraction system, named BioIE, to filter the data in KGHC. Then we introduced a fusion method which is used to fuse the extracted data. Finally, we stored the data into the Neo4j which can help researchers analyze the network of hepatocellular carcinoma. RESULTS KGHC contains 13,296 triples and provides the knowledge of hepatocellular carcinoma for healthcare professionals, making them free of digging into a large amount of biomedical literatures. This could hopefully improve the efficiency of researches on the hepatocellular carcinoma. KGHC is accessible free for academic research purpose at http://202.118.75.18:18895/browser/ . CONCLUSIONS In this paper, we present a knowledge graph associated with hepatocellular carcinoma, which is constructed with vast amounts of structured and unstructured data. The evaluation results show that the data in KGHC is of high quality.
Collapse
Affiliation(s)
- Nan Li
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 China
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 China
| | - Ling Luo
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 China
| | - Lei Wang
- Beijing Institute of Health Administration and Medical Information, Beijing, 100850 China
| | - Yin Zhang
- Beijing Institute of Health Administration and Medical Information, Beijing, 100850 China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 China
| | - Jian Wang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 China
| |
Collapse
|
17
|
Du J, Li X. A Knowledge Graph of Combined Drug Therapies Using Semantic Predications From Biomedical Literature: Algorithm Development. JMIR Med Inform 2020; 8:e18323. [PMID: 32343247 PMCID: PMC7218597 DOI: 10.2196/18323] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 03/26/2020] [Accepted: 03/29/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Combination therapy plays an important role in the effective treatment of malignant neoplasms and precision medicine. Numerous clinical studies have been carried out to investigate combination drug therapies. Automated knowledge discovery of these combinations and their graphic representation in knowledge graphs will enable pattern recognition and identification of drug combinations used to treat a specific type of cancer, improve drug efficacy and treatment of human disorders. OBJECTIVE This paper aims to develop an automated, visual approach to discover knowledge about combination therapies from biomedical literature, especially from those studies with high-level evidence such as clinical trial reports and clinical practice guidelines. METHODS Based on semantic predications, which consist of a triple structure of subject-predicate-object (SPO), we proposed an automated algorithm to discover knowledge of combination drug therapies using the following rules: 1) two or more semantic predications (S1-P-O and Si-P-O, i = 2, 3…) can be extracted from one conclusive claim (sentence) in the abstract of a given publication, and 2) these predications have an identical predicate (that closely relates to human disease treatment, eg, "treat") and object (eg, disease name) but different subjects (eg, drug names). A customized knowledge graph organizes and visualizes these combinations, improving the traditional semantic triples. After automatic filtering of broad concepts such as "pharmacologic actions" and generic disease names, a set of combination drug therapies were identified and characterized through manual interpretation. RESULTS We retrieved 22,263 clinical trial reports and 31 clinical practice guidelines from PubMed abstracts by searching "antineoplastic agents" for drug restriction (published between Jan 2009 and Oct 2019). There were 15,603 conclusive claims locally parsed using the search terms "conclusion*" and "conclude*" ready for semantic predications extraction by SemRep, and 325 candidate groups of semantic predications about combined medications were automatically discovered within 316 conclusive claims. Based on manual analysis, we determined that 255/316 claims (78.46%) were accurately identified as describing combination therapies and adopted these to construct the customized knowledge graph. We also identified two categories (and 4 subcategories) to characterize the inaccurate results: limitations of SemRep and limitations of proposal. We further learned the predominant patterns of drug combinations based on mechanism of action for new combined medication studies and discovered 4 obvious markers ("combin*," "coadministration," "co-administered," and "regimen") to identify potential combination therapies to enable development of a machine learning algorithm. CONCLUSIONS Semantic predications from conclusive claims in the biomedical literature can be used to support automated knowledge discovery and knowledge graph construction for combination therapies. A machine learning approach is warranted to take full advantage of the identified markers and other contextual features.
Collapse
Affiliation(s)
- Jian Du
- National Institute of Health Data Science, Peking University, Beijing, China
| | - Xiaoying Li
- Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China
| |
Collapse
|
18
|
Frey LJ, Talbert DA. Artificial Intelligence Pipeline to Bridge the Gap between Bench Researchers and Clinical Researchers in Precision Medicine. MED ONE 2020; 5:10.20900/mo20200001. [PMID: 33511289 PMCID: PMC7839064 DOI: 10.20900/mo20200001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Precision medicine informatics is a field of research that incorporates learning systems that generate new knowledge to improve individualized treatments using integrated data sets and models. Given the ever-increasing volumes of data that are relevant to patient care, artificial intelligence (AI) pipelines need to be a central component of such research to speed discovery. Applying AI methodology to complex multidisciplinary information retrieval can support efforts to discover bridging concepts within collaborating communities. This dovetails with precision medicine research, given the information rich multi-omic data that are used in precision medicine analysis pipelines. In this perspective article we define a prototype AI pipeline to facilitate discovering research connections between bioinformatics and clinical researchers. We propose building knowledge representations that are iteratively improved through AI and human-informed learning feedback loops supported through crowdsourcing. To illustrate this, we will explore the specific use case of nonalcoholic fatty liver disease, a growing health care problem. We will examine AI pipeline construction and utilization in relation to bench-to-bedside bridging concepts with interconnecting knowledge representations applicable to bioinformatics researchers and clinicians.
Collapse
Affiliation(s)
- Lewis J. Frey
- Department of Public Health Science, Biomedical Informatics Center, Hollings Cancer Center, Medical University of South Carolina (MUSC), 135 Cannon St, Charleston, SC 29425, USA
- Health Equity and Rural Outreach Innovation Center (HEROIC), Ralph H. Johnson Veteran Affairs Medical Center, Charleston, SC 29401, USA
| | - Douglas A. Talbert
- Department of Computer Science, Tennessee Tech University (TTU), 1 William L Jones Dr, Cookeville, TN 38505, USA
| |
Collapse
|
19
|
Shi HB, Huang D, Wang L, Wu MY, Xu YC, Zeng BE, Pang C. An information integration approach to spacecraft fault diagnosis. ENTERP INF SYST-UK 2019. [DOI: 10.1080/17517575.2019.1709663] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Hui-Bin Shi
- School of Management, Shenyang University of Technology, Shenyang, China
| | - Dan Huang
- School of Management, Shenyang University of Technology, Shenyang, China
| | - Li Wang
- School of Economics and Management, Beihang University, Beijing, China
| | - Mei-Yu Wu
- School of business, Shangdong Normal University, Jinan, China
| | - Ying-Cheng Xu
- Quality Management Branch, National Institute of Standardization, China
| | - Bao-Er Zeng
- Beijing C-Stellar Science&Technology Institute Co., Ltd., Beijing, China
| | - Chen Pang
- Beijing C-Stellar Science&Technology Institute Co., Ltd., Beijing, China
| |
Collapse
|
20
|
Sellami S, Dkaki T, Zarour NE, Charrel PJ. MidSemI. INTERNATIONAL JOURNAL OF INFORMATION SYSTEM MODELING AND DESIGN 2019. [DOI: 10.4018/ijismd.2019040101] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The web diversification into the Web of Data and social media means that companies need to gather all the necessary data to help make the best-informed market decisions. However, data providers on the web publish data in various data models and may equip it with different search capabilities, thus requiring data integration techniques to access them. This work explores the current challenges in this area, discusses the limitations of some existing integration tools, and addresses them by proposing a semantic mediator-based approach to virtually integrate enterprise data with large-scale social and linked data. The implementation of the proposed approach is a configurable middleware application and a user-friendly keyword search interface that retrieves its input from internal enterprise data combined with various SPARQL endpoints and Web APIs. An evaluation study was conducted to compare its features with recent integration approaches. The results illustrate the added value and usability of the contributed approach.
Collapse
Affiliation(s)
- Samir Sellami
- LIRE Laboratory, University of Constantine 2 - Abdelhamid Mehri, Constantine, Algeria
| | - Taoufiq Dkaki
- IRIT Laboratory, University of Toulouse 2 - Jean Jaurès, Toulouse, France
| | - Nacer Eddine Zarour
- LIRE Laboratory, University of Constantine 2 - Abdelhamid Mehri, Constantine, Algeria
| | | |
Collapse
|
21
|
Gyrard A, Gaur M, Shekarpour S, Thirunarayan K, Sheth A. Personalized Health Knowledge Graph. CEUR WORKSHOP PROCEEDINGS 2018; 2317:5. [PMID: 34690624 PMCID: PMC8532078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Our current health applications do not adequately take into account contextual and personalized knowledge about patients. In order to design "Personalized Coach for Healthcare" applications to manage chronic diseases, there is a need to create a Personalized Healthcare Knowledge Graph (PHKG) that takes into consideration a patient's health condition (personalized knowledge) and enriches that with contextualized knowledge from environmental sensors and Web of Data (e.g., symptoms and treatments for diseases). To develop PHKG, aggregating knowledge from various heterogeneous sources such as the Internet of Things (IoT) devices, clinical notes, and Electronic Medical Records (EMRs) is necessary. In this paper, we explain the challenges of collecting, managing, analyzing, and integrating patients' health data from various sources in order to synthesize and deduce meaningful information embodying the vision of the Data, Information, Knowledge, and Wisdom (DIKW) pyramid. Furthermore, we sketch a solution that combines: 1) IoT data analytics, and 2) explicit knowledge and illustrate it using three chronic disease use cases - asthma, obesity, and Parkinson's.
Collapse
|
22
|
Dhombres F, Charlet J. As Ontologies Reach Maturity, Artificial Intelligence Starts Being Fully Efficient: Findings from the Section on Knowledge Representation and Management for the Yearbook 2018. Yearb Med Inform 2018; 27:140-145. [PMID: 30157517 PMCID: PMC6115232 DOI: 10.1055/s-0038-1667078] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Objectives:
To select, present, and summarize the best papers published in 2017 in the field of Knowledge Representation and Management (KRM).
Methods:
A comprehensive and standardized review of the medical informatics literature was performed to select the most interesting papers of KRM published in 2017, based on a PubMed query.
Results:
In direct line with the research on data integration presented in the KRM section of the 2017 edition of the International Medical Informatics Association (IMIA) Yearbook, the five best papers for 2018 demonstrate even further the added-value of ontology-based integration approaches for phenotype-genotype association mining. Additionally, among the 15 preselected papers, two aspects of KRM are in the spotlight: the design of knowledge bases and new challenges in using ontologies.
Conclusions:
Ontologies are demonstrating their maturity to integrate medical data and begin to support clinical practices. New challenges have emerged: the query on distributed semantically annotated datasets, the efficiency of semantic annotation processes, the semantic representation of large textual datasets, the control of biases associated with semantic annotations, and the computation of Bayesian indicators on data annotated with ontologies.
Collapse
Affiliation(s)
- Ferdinand Dhombres
- Sorbonne Université, Université Paris 13, Sorbonne Paris Cité, INSERM, UMR_S 1142, LIMICS, Paris, France.,Sorbonne Université Médecine, Service de Médecine Foetale, AP-HP/HUEP, Hôpital Armand Trousseau, Paris, France
| | - Jean Charlet
- Sorbonne Université, Université Paris 13, Sorbonne Paris Cité, INSERM, UMR_S 1142, LIMICS, Paris, France.,AP-HP, DRCI, Paris, France
| | | |
Collapse
|