1
|
Rossanez A, Dos Reis JC, Torres RDS, de Ribaupierre H. KGen: a knowledge graph generator from biomedical scientific literature. BMC Med Inform Decis Mak 2020; 20:314. [PMID: 33317512 PMCID: PMC7734730 DOI: 10.1186/s12911-020-01341-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 11/17/2020] [Indexed: 11/26/2022] Open
Abstract
Background Knowledge is often produced from data generated in scientific investigations. An ever-growing number of scientific studies in several domains result into a massive amount of data, from which obtaining new knowledge requires computational help. For example, Alzheimer’s Disease, a life-threatening degenerative disease that is not yet curable. As the scientific community strives to better understand it and find a cure, great amounts of data have been generated, and new knowledge can be produced. A proper representation of such knowledge brings great benefits to researchers, to the scientific community, and consequently, to society. Methods In this article, we study and evaluate a semi-automatic method that generates knowledge graphs (KGs) from biomedical texts in the scientific literature. Our solution explores natural language processing techniques with the aim of extracting and representing scientific literature knowledge encoded in KGs. Our method links entities and relations represented in KGs to concepts from existing biomedical ontologies available on the Web. We demonstrate the effectiveness of our method by generating KGs from unstructured texts obtained from a set of abstracts taken from scientific papers on the Alzheimer’s Disease. We involve physicians to compare our extracted triples from their manual extraction via their analysis of the abstracts. The evaluation further concerned a qualitative analysis by the physicians of the generated KGs with our software tool. Results The experimental results indicate the quality of the generated KGs. The proposed method extracts a great amount of triples, showing the effectiveness of our rule-based method employed in the identification of relations in texts. In addition, ontology links are successfully obtained, which demonstrates the effectiveness of the ontology linking method proposed in this investigation. Conclusions We demonstrate that our proposal is effective on building ontology-linked KGs representing the knowledge obtained from biomedical scientific texts. Such representation can add value to the research in various domains, enabling researchers to compare the occurrence of concepts from different studies. The KGs generated may pave the way to potential proposal of new theories based on data analysis to advance the state of the art in their research domains.
Collapse
Affiliation(s)
- Anderson Rossanez
- Institute of Computing, University of Campinas, Campinas, SP, Brazil.
| | | | - Ricardo da Silva Torres
- Department of ICT and Natural Sciences, Faculty of Information Technology and Electrical Engineering, NTNU - Norwegian University of Science and Technology, Ålesund, Norway
| | | |
Collapse
|
2
|
Jain S, Khaiboullina SF, Baranwal M. Immunological Perspective for Ebola Virus Infection and Various Treatment Measures Taken to Fight the Disease. Pathogens 2020; 9:E850. [PMID: 33080902 PMCID: PMC7603231 DOI: 10.3390/pathogens9100850] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2020] [Revised: 10/07/2020] [Accepted: 10/16/2020] [Indexed: 12/19/2022] Open
Abstract
Ebolaviruses, discovered in 1976, belongs to the Filoviridae family, which also includes Marburg and Lloviu viruses. They are negative-stranded RNA viruses with six known species identified to date. Ebola virus (EBOV) is a member of Zaire ebolavirus species and can cause the Ebola virus disease (EVD), an emerging zoonotic disease that results in homeostatic imbalance and multi-organ failure. There are three EBOV outbreaks documented in the last six years resulting in significant morbidity (> 32,000 cases) and mortality (> 13,500 deaths). The potential factors contributing to the high infectivity of this virus include multiple entry mechanisms, susceptibility of the host cells, employment of multiple immune evasion mechanisms and rapid person-to-person transmission. EBOV infection leads to cytokine storm, disseminated intravascular coagulation, host T cell apoptosis as well as cell mediated and humoral immune response. In this review, a concise recap of cell types targeted by EBOV and EVD symptoms followed by detailed run-through of host innate and adaptive immune responses, virus-driven regulation and their combined effects contributing to the disease pathogenesis has been presented. At last, the vaccine and drug development initiatives as well as challenges related to the management of infection have been discussed.
Collapse
Affiliation(s)
- Sahil Jain
- Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala 147004, Punjab, India;
| | - Svetlana F. Khaiboullina
- Department of Microbiology and Immunology, University of Nevada, Reno, NV 89557, USA
- Institute of Fundamental Medicine and Biology, Kazan Federal University, 420008 Kazan, Tatarstan, Russia
| | - Manoj Baranwal
- Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala 147004, Punjab, India;
| |
Collapse
|
3
|
Li N, Yang Z, Luo L, Wang L, Zhang Y, Lin H, Wang J. KGHC: a knowledge graph for hepatocellular carcinoma. BMC Med Inform Decis Mak 2020; 20:135. [PMID: 32646496 PMCID: PMC7346328 DOI: 10.1186/s12911-020-1112-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Hepatocellular carcinoma is one of the most general malignant neoplasms in adults with high mortality. Mining relative medical knowledge from rapidly growing text data and integrating it with other existing biomedical resources will provide support to the research on the hepatocellular carcinoma. To this purpose, we constructed a knowledge graph for Hepatocellular Carcinoma (KGHC). METHODS We propose an approach to build a knowledge graph for hepatocellular carcinoma. Specifically, we first extracted knowledge from structured data and unstructured data. Since the extracted entities may contain some noise, we applied a biomedical information extraction system, named BioIE, to filter the data in KGHC. Then we introduced a fusion method which is used to fuse the extracted data. Finally, we stored the data into the Neo4j which can help researchers analyze the network of hepatocellular carcinoma. RESULTS KGHC contains 13,296 triples and provides the knowledge of hepatocellular carcinoma for healthcare professionals, making them free of digging into a large amount of biomedical literatures. This could hopefully improve the efficiency of researches on the hepatocellular carcinoma. KGHC is accessible free for academic research purpose at http://202.118.75.18:18895/browser/ . CONCLUSIONS In this paper, we present a knowledge graph associated with hepatocellular carcinoma, which is constructed with vast amounts of structured and unstructured data. The evaluation results show that the data in KGHC is of high quality.
Collapse
Affiliation(s)
- Nan Li
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 China
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 China
| | - Ling Luo
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 China
| | - Lei Wang
- Beijing Institute of Health Administration and Medical Information, Beijing, 100850 China
| | - Yin Zhang
- Beijing Institute of Health Administration and Medical Information, Beijing, 100850 China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 China
| | - Jian Wang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 China
| |
Collapse
|
4
|
Kamdar MR, Fernández JD, Polleres A, Tudorache T, Musen MA. Enabling Web-scale data integration in biomedicine through Linked Open Data. NPJ Digit Med 2019; 2:90. [PMID: 31531395 PMCID: PMC6736878 DOI: 10.1038/s41746-019-0162-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 08/06/2019] [Indexed: 01/17/2023] Open
Abstract
The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems.
Collapse
Affiliation(s)
- Maulik R. Kamdar
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA USA
| | - Javier D. Fernández
- Vienna University of Economics & Business, Vienna, Austria
- Complexity Science Hub Vienna, Vienna, Austria
| | - Axel Polleres
- Vienna University of Economics & Business, Vienna, Austria
- Complexity Science Hub Vienna, Vienna, Austria
| | - Tania Tudorache
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA USA
| | - Mark A. Musen
- Center for Biomedical Informatics Research, Stanford University, Stanford, CA USA
| |
Collapse
|
5
|
Balmith M, Soliman MES. Potential Ebola drug targets — filling the gap: a critical step forward towards the design and discovery of potential drugs. Biologia (Bratisl) 2017. [DOI: 10.1515/biolog-2017-0012] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
6
|
EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2017; 1:260-303. [DOI: 10.1007/s41666-017-0010-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2017] [Revised: 10/10/2017] [Accepted: 10/11/2017] [Indexed: 10/18/2022]
|
7
|
Kamdar MR, Musen MA. PhLeGrA: Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data. PROCEEDINGS OF THE ... INTERNATIONAL WORLD-WIDE WEB CONFERENCE. INTERNATIONAL WWW CONFERENCE 2017; 2017:321-329. [PMID: 29479581 PMCID: PMC5824722 DOI: 10.1145/3038912.3052692] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Integrated approaches for pharmacology are required for the mechanism-based predictions of adverse drug reactions that manifest due to concomitant intake of multiple drugs. These approaches require the integration and analysis of biomedical data and knowledge from multiple, heterogeneous sources with varying schemas, entity notations, and formats. To tackle these integrative challenges, the Semantic Web community has published and linked several datasets in the Life Sciences Linked Open Data (LSLOD) cloud using established W3C standards. We present the PhLeGrA platform for Linked Graph Analytics in Pharmacology in this paper. Through query federation, we integrate four sources from the LSLOD cloud and extract a drug-reaction network, composed of distinct entities. We represent this graph as a hidden conditional random field (HCRF), a discriminative latent variable model that is used for structured output predictions. We calculate the underlying probability distributions in the drug-reaction HCRF using the datasets from the U.S. Food and Drug Administration's Adverse Event Reporting System. We predict the occurrence of 146 adverse reactions due to multiple drug intake with an AUROC statistic greater than 0.75. The PhLeGrA platform can be extended to incorporate other sources published using Semantic Web technologies, as well as to discover other types of pharmacological associations.
Collapse
Affiliation(s)
- Maulik R Kamdar
- Center for Biomedical Informatics Research, Stanford University, USA
| | - Mark A Musen
- Center for Biomedical Informatics Research, Stanford University, USA
| |
Collapse
|
8
|
Stano M, Beke G, Klucar L. viruSITE-integrated database for viral genomics. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw162. [PMID: 28025349 PMCID: PMC5199161 DOI: 10.1093/database/baw162] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 11/11/2016] [Accepted: 11/16/2016] [Indexed: 11/14/2022]
Abstract
Viruses are the most abundant biological entities and the reservoir of most of the genetic diversity in the Earth's biosphere. Viral genomes are very diverse, generally short in length and compared to other organisms carry only few genes. viruSITE is a novel database which brings together high-value information compiled from various resources. viruSITE covers the whole universe of viruses and focuses on viral genomes, genes and proteins. The database contains information on virus taxonomy, host range, genome features, sequential relatedness as well as the properties and functions of viral genes and proteins. All entries in the database are linked to numerous information resources. The above-mentioned features make viruSITE a comprehensive knowledge hub in the field of viral genomics. The web interface of the database was designed so as to offer an easy-to-navigate, intuitive and user-friendly environment. It provides sophisticated text searching and a taxonomy-based browsing system. viruSITE also allows for an alternative approach based on sequence search. A proprietary genome browser generates a graphical representation of viral genomes. In addition to retrieving and visualising data, users can perform comparative genomics analyses using a variety of tools. Database URL: http://www.virusite.org/
Collapse
Affiliation(s)
- Matej Stano
- Laboratory of Bioinformatics, Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Gabor Beke
- Laboratory of Bioinformatics, Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Lubos Klucar
- Laboratory of Bioinformatics, Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia
| |
Collapse
|