Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kilicoglu H, Rosemblat G, Fiszman M, Shin D. Broad-coverage biomedical relation extraction with SemRep. BMC Bioinformatics 2020;21:188. [PMID: 32410573 PMCID: PMC7222583 DOI: 10.1186/s12859-020-3517-7] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 04/29/2020] [Indexed: 11/10/2022] Open

For:	Kilicoglu H, Rosemblat G, Fiszman M, Shin D. Broad-coverage biomedical relation extraction with SemRep. BMC Bioinformatics 2020;21:188. [PMID: 32410573 PMCID: PMC7222583 DOI: 10.1186/s12859-020-3517-7] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 04/29/2020] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Wei CH, Allot A, Lai PT, Leaman R, Tian S, Luo L, Jin Q, Wang Z, Chen Q, Lu Z. PubTator 3.0: an AI-powered literature resource for unlocking biomedical knowledge. Nucleic Acids Res 2024;52:W540-W546. [PMID: 38572754 PMCID: PMC11223843 DOI: 10.1093/nar/gkae235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/02/2024] [Accepted: 03/21/2024] [Indexed: 04/05/2024] Open

Ming S, Zhang R, Kilicoglu H. Enhancing the coverage of SemRep using a relation classification approach. J Biomed Inform 2024;155:104658. [PMID: 38782169 DOI: 10.1016/j.jbi.2024.104658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/01/2024] [Accepted: 05/18/2024] [Indexed: 05/25/2024]

Abstract

OBJECTIVE

Relation extraction is an essential task in the field of biomedical literature mining and offers significant benefits for various downstream applications, including database curation, drug repurposing, and literature-based discovery. The broad-coverage natural language processing (NLP) tool SemRep has established a solid baseline for extracting subject-predicate-object triples from biomedical text and has served as the backbone of the Semantic MEDLINE Database (SemMedDB), a PubMed-scale repository of semantic triples. While SemRep achieves reasonable precision (0.69), its recall is relatively low (0.42). In this study, we aimed to enhance SemRep using a relation classification approach, in order to eventually increase the size and the utility of SemMedDB.

METHODS

We combined and extended existing SemRep evaluation datasets to generate training data. We leveraged the pre-trained PubMedBERT model, enhancing it through additional contrastive pre-training and fine-tuning. We experimented with three entity representations: mentions, semantic types, and semantic groups. We evaluated the model performance on a portion of the SemRep Gold Standard dataset and compared it to SemRep performance. We also assessed the effect of the model on a larger set of 12K randomly selected PubMed abstracts.

RESULTS

Our results show that the best model yields a precision of 0.62, recall of 0.81, and F1 score of 0.70. Assessment on 12K abstracts shows that the model could double the size of SemMedDB, when applied to entire PubMed. We also manually assessed the quality of 506 triples predicted by the model that SemRep had not previously identified, and found that 67% of these triples were correct.

CONCLUSION

These findings underscore the promise of our model in achieving a more comprehensive coverage of relationships mentioned in biomedical literature, thereby showing its potential in enhancing various downstream applications of biomedical literature mining. Data and code related to this study are available at https://github.com/Michelle-Mings/SemRep_RelationClassification.

Collapse

Tyagin I, Safro I. Dyport: dynamic importance-based biomedical hypothesis generation benchmarking technique. BMC Bioinformatics 2024;25:213. [PMID: 38872097 PMCID: PMC11177514 DOI: 10.1186/s12859-024-05812-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 05/16/2024] [Indexed: 06/15/2024] Open

Xiao Y, Hou Y, Zhou H, Diallo G, Fiszman M, Wolfson J, Zhou L, Kilicoglu H, Chen Y, Su C, Xu H, Mantyh WG, Zhang R. Repurposing non-pharmacological interventions for Alzheimer's disease through link prediction on biomedical literature. Sci Rep 2024;14:8693. [PMID: 38622164 PMCID: PMC11018822 DOI: 10.1038/s41598-024-58604-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 04/01/2024] [Indexed: 04/17/2024] Open

Jin S, Liang H, Zhang W, Li H. Knowledge Graph for Breast Cancer Prevention and Treatment: Literature-Based Data Analysis Study. JMIR Med Inform 2024;12:e52210. [PMID: 38409769 PMCID: PMC11004512 DOI: 10.2196/52210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 01/02/2024] [Accepted: 01/06/2024] [Indexed: 02/28/2024] Open

Abstract

Background

The incidence of breast cancer has remained high and continues to rise since the 21st century. Consequently, there has been a significant increase in research efforts focused on breast cancer prevention and treatment. Despite the extensive body of literature available on this subject, systematic integration is lacking. To address this issue, knowledge graphs have emerged as a valuable tool. By harnessing their powerful knowledge integration capabilities, knowledge graphs offer a comprehensive and structured approach to understanding breast cancer prevention and treatment.

Objective

We aim to integrate literature data on breast cancer treatment and prevention, build a knowledge graph, and provide support for clinical decision-making.

Methods

We used Medical Subject Headings terms to search for clinical trial literature on breast cancer prevention and treatment published on PubMed between 2018 and 2022. We downloaded triplet data from the Semantic MEDLINE Database (SemMedDB) and matched them with the retrieved literature to obtain triplet data for the target articles. We visualized the triplet information using NetworkX for knowledge discovery.

Results

Within the scope of literature research in the past 5 years, malignant neoplasms appeared most frequently (587/1387, 42.3%). Pharmacotherapy (267/1387, 19.3%) was the primary treatment method, with trastuzumab (209/1805, 11.6%) being the most commonly used therapeutic drug. Through the analysis of the knowledge graph, we have discovered a complex network of relationships between treatment methods, therapeutic drugs, and preventive measures for different types of breast cancer.

Conclusions

This study constructed a knowledge graph for breast cancer prevention and treatment, which enabled the integration and knowledge discovery of relevant literature in the past 5 years. Researchers can gain insights into treatment methods, drugs, preventive knowledge regarding adverse reactions to treatment, and the associations between different knowledge domains from the graph.

Collapse

Kilicoglu H, Ensan F, McInnes B, Wang LL. Semantics-enabled biomedical literature analytics. J Biomed Inform 2024;150:104588. [PMID: 38244957 DOI: 10.1016/j.jbi.2024.104588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 01/10/2024] [Indexed: 01/22/2024]

Colicchio TK, Osborne JD, Do Rosario CV, Anand A, Timkovich NA, Wyatt MC, Cimino JJ. Semantically oriented EHR navigation with a patient specific knowledge base and a clinical context ontology. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024;2023:309-318. [PMID: 38222434 PMCID: PMC10785934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]

Jeynes JCG, James T, Corney M. Natural Language Processing for Drug Discovery Knowledge Graphs: Promises and Pitfalls. Methods Mol Biol 2024;2716:223-240. [PMID: 37702942 DOI: 10.1007/978-1-0716-3449-3_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]

Millikin RJ, Raja K, Steill J, Lock C, Tu X, Ross I, Tsoi LC, Kuusisto F, Ni Z, Livny M, Bockelman B, Thomson J, Stewart R. Serial KinderMiner (SKiM) discovers and annotates biomedical knowledge using co-occurrence and transformer models. BMC Bioinformatics 2023;24:412. [PMID: 37915001 PMCID: PMC10619245 DOI: 10.1186/s12859-023-05539-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 10/19/2023] [Indexed: 11/03/2023] Open

Abstract

BACKGROUND

The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A-B-C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues.

RESULTS

We demonstrate SKiM's ability to discover useful A-B-C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface ( https://skim.morgridge.org ) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches.

CONCLUSIONS

SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph.

Collapse

Tan J, Hu J, Dong S. Incorporating entity-level knowledge in pretrained language model for biomedical dense retrieval. Comput Biol Med 2023;166:107535. [PMID: 37788508 DOI: 10.1016/j.compbiomed.2023.107535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 09/04/2023] [Accepted: 09/27/2023] [Indexed: 10/05/2023]

Lou P, Fang A, Zhao W, Yao K, Yang Y, Hu J. Potential Target Discovery and Drug Repurposing for Coronaviruses: Study Involving a Knowledge Graph-Based Approach. J Med Internet Res 2023;25:e45225. [PMID: 37862061 PMCID: PMC10592722 DOI: 10.2196/45225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 08/30/2023] [Accepted: 09/22/2023] [Indexed: 10/21/2023] Open

Abstract

BACKGROUND

The global pandemics of severe acute respiratory syndrome, Middle East respiratory syndrome, and COVID-19 have caused unprecedented crises for public health. Coronaviruses are constantly evolving, and it is unknown which new coronavirus will emerge and when the next coronavirus will sweep across the world. Knowledge graphs are expected to help discover the pathogenicity and transmission mechanism of viruses.

OBJECTIVE

The aim of this study was to discover potential targets and candidate drugs to repurpose for coronaviruses through a knowledge graph-based approach.

METHODS

We propose a computational and evidence-based knowledge discovery approach to identify potential targets and candidate drugs for coronaviruses from biomedical literature and well-known knowledge bases. To organize the semantic triples extracted automatically from biomedical literature, a semantic conversion model was designed. The literature knowledge was associated and integrated with existing drug and gene knowledge through semantic mapping, and the coronavirus knowledge graph (CovKG) was constructed. We adopted both the knowledge graph embedding model and the semantic reasoning mechanism to discover unrecorded mechanisms of drug action as well as potential targets and drug candidates. Furthermore, we have provided evidence-based support with a scoring and backtracking mechanism.

RESULTS

The constructed CovKG contains 17,369,620 triples, of which 641,195 were extracted from biomedical literature, covering 13,065 concept unique identifiers, 209 semantic types, and 97 semantic relations of the Unified Medical Language System. Through multi-source knowledge integration, 475 drugs and 262 targets were mapped to existing knowledge, and 41 new drug mechanisms of action were found by semantic reasoning, which were not recorded in the existing knowledge base. Among the knowledge graph embedding models, TransR outperformed others (mean reciprocal rank=0.2510, Hits@10=0.3505). A total of 33 potential targets and 18 drug candidates were identified for coronaviruses. Among them, 7 novel drugs (ie, quinine, nelfinavir, ivermectin, asunaprevir, tylophorine, Artemisia annua extract, and resveratrol) and 3 highly ranked targets (ie, angiotensin converting enzyme 2, transmembrane serine protease 2, and M protein) were further discussed.

CONCLUSIONS

We showed the effectiveness of a knowledge graph-based approach in potential target discovery and drug repurposing for coronaviruses. Our approach can be extended to other viruses or diseases for biomedical knowledge discovery and relevant applications.

Collapse

Jeynes JCG, Corney M, James T. A large-scale evaluation of NLP-derived chemical-gene/protein relationships from the scientific literature: Implications for knowledge graph construction. PLoS One 2023;18:e0291142. [PMID: 37682956 PMCID: PMC10490933 DOI: 10.1371/journal.pone.0291142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 08/22/2023] [Indexed: 09/10/2023] Open

Abstract

One area of active research is the use of natural language processing (NLP) to mine biomedical texts for sets of triples (subject-predicate-object) for knowledge graph (KG) construction. While statistical methods to mine co-occurrences of entities within sentences are relatively robust, accurate relationship extraction is more challenging. Herein, we evaluate the Global Network of Biomedical Relationships (GNBR), a dataset that uses distributional semantics to model relationships between biomedical entities. The focus of our paper is an evaluation of a subset of the GNBR data; the relationships between chemicals and genes/proteins. We use Evotec's structured 'Nexus' database of >2.76M chemical-protein interactions as a ground truth to compare with GNBRs relationships and find a micro-averaged precision-recall area under the curve (AUC) of 0.50 and a micro-averaged receiver operating characteristic (ROC) curve AUC of 0.71 across the relationship classes 'inhibits', 'binding', 'agonism' and 'antagonism', when a comparison is made on a sentence-by-sentence basis. We conclude that, even though these micro-average scores are modest, using a high threshold on certain relationship classes like 'inhibits' could yield high fidelity triples that are not reported in structured datasets. We discuss how different methods of processing GNBR data, and the factuality of triples could affect the accuracy of NLP data incorporated into knowledge graphs. We provide a GNBR-Nexus(ChEMBL-subset) merged datafile that contains over 20,000 sentences where a protein/gene-chemical co-occur and includes both the GNBR relationship scores as well as the ChEMBL (manually curated) relationships (e.g., 'agonist', 'inhibitor') -this can be accessed at https://doi.org/10.5281/zenodo.8136752. We envisage this being used to aid curation efforts by the drug discovery community.

Collapse

Fan W, Zhang Y, Wang D, Wang C, Yang J. The impact of Yiwei decoction on the LncRNA and CircRNA regulatory networks in premature ovarian insufficiency. Heliyon 2023;9:e20022. [PMID: 37809621 PMCID: PMC10559751 DOI: 10.1016/j.heliyon.2023.e20022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 08/19/2023] [Accepted: 09/08/2023] [Indexed: 10/10/2023] Open

Abstract

Premature ovarian insufficiency（POI）is a female reproductive aging illness. Yiwei decoction（YWD） is a traditional treatment for Yangming nourishment. YWD can treat premature ovarian insufficiency, but the exact molecular mechanism is unknown. As a result, the differential expression of Long noncoding RNAs (LncRNAs) and Circular RNAs(CircRNAs) in the ovary of POI rats after YWD treatment was investigated in this paper, and the CeRNA regulatory network was built. The model was created using cyclophosphamide. The model group + YWD was in Group A, the model control group was in Group B, and the regular control group was in Group C. In this study, 177 differential expression Long noncoding RNAs(DELncRNAs) and 190 differential expression Circular RNAs (DECircRNAs) were discovered between A and B (P＜0.05,|LogFC|＞1). Following the analysis, 27 DELncRNAs and 96 DECircRNAs (P-adjusted＜0.05,|LogFC|＞1) were discovered. At the same time, we built the CeRNA network using differentially expressed mRNAs and microRNAs (miRNAs) expression between groups A and B. The DELncRNAs were used to construct a lncRNA-miRNA-mRNA ceRNA network with 27 LncRNAs, 4 miRNAs, and 19 mRNAs. The DECircRNAs were utilized to establish a CircRNA-miRNA-mRNA ceRNA network that was made up of 15 CircRNAs, 4 miRNAs, and 20 mRNA. The highly correlated regulatory networks were the LncMSTRG.22691.3/miR-3102/ANGPT4 and Circ10_34698898_34699378/miR-33-5p/TTC22. Circ20_12035276_12036793、Circ20_30693935_30696337、Circ4_157723097_157723378 and Circ4_157923266_157923904 occurred concurrently in AvsB, BvsC, and AvsC. MiRDB predicted eight target miRNAs for these CircRNAs. The miRanda(score = 140，energy = -1) binding energy calculation revealed that seven miRNAs were well combined with three CircRNA base complementary pairs. This implies that 3 DECircRNAs could serve as spongy bodies for these miRNAs. Network pharmacological analysis showed that ten active components in YWD may regulate the expression of LncRNAs and CircRNAs, such as Stigmasterol, Uridine, Ophiopogonanone A, Gamma-Aminobutyric Acid, and others. In conclusion, this study combined transcriptomics and network pharmacological analysis to identify differentially expressed lncRNAs as well as CircRNAs in ovaries of YWD-treated POI rats, thereby constructing ceRNA networks implicated in POI. This would contribute to clarifying the pathways by which Chinese herbal compounds regulate gene expression in POI.

Collapse

Malec SA, Taneja SB, Albert SM, Elizabeth Shaaban C, Karim HT, Levine AS, Munro P, Callahan TJ, Boyce RD. Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease. J Biomed Inform 2023;142:104368. [PMID: 37086959 PMCID: PMC10355339 DOI: 10.1016/j.jbi.2023.104368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 03/03/2023] [Accepted: 04/17/2023] [Indexed: 04/24/2023]

Abstract

BACKGROUND

Causal feature selection is essential for estimating effects from observational data. Identifying confounders is a crucial step in this process. Traditionally, researchers employ content-matter expertise and literature review to identify confounders. Uncontrolled confounding from unidentified confounders threatens validity, conditioning on intermediate variables (mediators) weakens estimates, and conditioning on common effects (colliders) induces bias. Additionally, without special treatment, erroneous conditioning on variables combining roles introduces bias. However, the vast literature is growing exponentially, making it infeasible to assimilate this knowledge. To address these challenges, we introduce a novel knowledge graph (KG) application enabling causal feature selection by combining computable literature-derived knowledge with biomedical ontologies. We present a use case of our approach specifying a causal model for estimating the total causal effect of depression on the risk of developing Alzheimer's disease (AD) from observational data.

METHODS

We extracted computable knowledge from a literature corpus using three machine reading systems and inferred missing knowledge using logical closure operations. Using a KG framework, we mapped the output to target terminologies and combined it with ontology-grounded resources. We translated epidemiological definitions of confounder, collider, and mediator into queries for searching the KG and summarized the roles played by the identified variables. We compared the results with output from a complementary method and published observational studies and examined a selection of confounding and combined role variables in-depth.

RESULTS

Our search identified 128 confounders, including 58 phenotypes, 47 drugs, 35 genes, 23 collider, and 16 mediator phenotypes. However, only 31 of the 58 confounder phenotypes were found to behave exclusively as confounders, while the remaining 27 phenotypes played other roles. Obstructive sleep apnea emerged as a potential novel confounder for depression and AD. Anemia exemplified a variable playing combined roles.

CONCLUSION

Our findings suggest combining machine reading and KG could augment human expertise for causal feature selection. However, the complexity of causal feature selection for depression with AD highlights the need for standardized field-specific databases of causal variables. Further work is needed to optimize KG search and transform the output for human consumption.

Collapse

Jiang Y, Kavuluru R. End-to-End n-ary Relation Extraction for Combination Drug Therapies. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2023;2023:72-80. [PMID: 38283165 PMCID: PMC10814995 DOI: 10.1109/ichi57859.2023.00021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]

Millikin RJ, Raja K, Steill J, Lock C, Tu X, Ross I, Tsoi LC, Kuusisto F, Ni Z, Livny M, Bockelman B, Thomson J, Stewart R. Serial KinderMiner (SKiM) Discovers and Annotates Biomedical Knowledge Using Co-Occurrence and Transformer Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.30.542911. [PMID: 37397987 PMCID: PMC10312590 DOI: 10.1101/2023.05.30.542911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]

Abstract

Background

The PubMed database contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A-B-C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: 1) they identify a relationship but not the type of relationship, 2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, 3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or 4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues.

Results

Conclusions

Collapse

Hsiao TK, Torvik VI. OpCitance: Citation contexts identified from the PubMed Central open access articles. Sci Data 2023;10:243. [PMID: 37117220 PMCID: PMC10139909 DOI: 10.1038/s41597-023-02134-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 04/04/2023] [Indexed: 04/30/2023] Open

Sousa DF, Couto FM. K-RET: knowledgeable biomedical relation extraction system. BIOINFORMATICS (OXFORD, ENGLAND) 2023;39:7108769. [PMID: 37018156 PMCID: PMC10112952 DOI: 10.1093/bioinformatics/btad174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 02/25/2023] [Accepted: 03/29/2023] [Indexed: 04/20/2023]

Taneja SB, Callahan TJ, Paine MF, Kane-Gill SL, Kilicoglu H, Joachimiak MP, Boyce RD. Developing a Knowledge Graph for Pharmacokinetic Natural Product-Drug Interactions. J Biomed Inform 2023;140:104341. [PMID: 36933632 PMCID: PMC10150409 DOI: 10.1016/j.jbi.2023.104341] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/09/2023] [Accepted: 03/13/2023] [Indexed: 03/17/2023]

Abstract

BACKGROUND

Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical or other natural products are co-consumed with pharmaceutical drugs. With the growing use of natural products, the risk for potential NPDIs and consequent adverse events has increased. Understanding mechanisms of NPDIs is key to preventing or minimizing adverse events. Although biomedical knowledge graphs (KGs) have been widely used for drug-drug interaction applications, computational investigation of NPDIs is novel. We constructed NP-KG as a first step toward computational discovery of plausible mechanistic explanations for pharmacokinetic NPDIs that can be used to guide scientific research.

METHODS

We developed a large-scale, heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature. To construct the KG, biomedical ontologies and drug databases were integrated with the Phenotype Knowledge Translator framework. The semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler, were used to extract semantic predications (subject-relation-object triples) from full texts of the scientific literature related to the exemplar natural products green tea and kratom. A literature-based graph constructed from the predications was integrated into the ontology-grounded KG to create NP-KG. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through KG path searches and meta-path discovery to determine congruent and contradictory information in NP-KG compared to ground truth data. We also conducted an error analysis to identify knowledge gaps and incorrect predications in the KG.

RESULTS

The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information compared to ground truth data. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature.

CONCLUSION

NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify known pharmacokinetic interactions between natural products and pharmaceutical drugs mediated by drug metabolizing enzymes and transporters. Future work will incorporate context, contradiction analysis, and embedding-based methods to enrich NP-KG. NP-KG is publicly available at https://doi.org/10.5281/zenodo.6814507. The code for relation extraction, KG construction, and hypothesis generation is available at https://github.com/sanyabt/np-kg.

Collapse

Lyu K, Tian Y, Shang Y, Zhou T, Yang Z, Liu Q, Yao X, Zhang P, Chen J, Li J. Causal knowledge graph construction and evaluation for clinical decision support of diabetic nephropathy. J Biomed Inform 2023;139:104298. [PMID: 36731730 DOI: 10.1016/j.jbi.2023.104298] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 12/25/2022] [Accepted: 01/25/2023] [Indexed: 01/31/2023]

Abstract

BACKGROUND

Many important clinical decisions require causal knowledge (CK) to take action. Although many causal knowledge bases for medicine have been constructed, a comprehensive evaluation based on real-world data and methods for handling potential knowledge noise are still lacking.

OBJECTIVE

The objectives of our study are threefold: (1) propose a framework for the construction of a large-scale and high-quality causal knowledge graph (CKG); (2) design the methods for knowledge noise reduction to improve the quality of the CKG; (3) evaluate the knowledge completeness and accuracy of the CKG using real-world data.

MATERIAL AND METHODS

We extracted causal triples from three knowledge sources (SemMedDB, UpToDate and Churchill's Pocketbook of Differential Diagnosis) based on rule methods and language models, performed ontological encoding, and then designed semantic modeling between electronic health record (EHR) data and the CKG to complete knowledge instantiation. We proposed two graph pruning strategies (co-occurrence ratio and causality ratio) to reduce the potential noise introduced by SemMedDB. Finally, the evaluation was carried out by taking the diagnostic decision support (DDS) of diabetic nephropathy (DN) as a real-world case. The data originated from a Chinese hospital EHR system from October 2010 to October 2020. The knowledge completeness and accuracy of the CKG were evaluated based on three state-of-the-art embedding methods (R-GCN, MHGRN and MedPath), the annotated clinical text and the expert review, respectively.

RESULTS

This graph included 153,289 concepts and 1,719,968 causal triples. A total of 1427 inpatient data were used for evaluation. Better results were achieved by combining three knowledge sources than using only SemMedDB (three models: area under the receiver operating characteristic curve (AUC): p < 0.01, F1: p < 0.01), and the graph covered 93.9 % of the causal relations between diseases and diagnostic evidence recorded in clinical text. Causal relations played a vital role in all relations related to disease progression for DDS of DN (three models: AUC: p > 0.05, F1: p > 0.05), and after pruning, the knowledge accuracy of the CKG was significantly improved (three models: AUC: p < 0.01, F1: p < 0.01; expert review: average accuracy: + 5.5 %).

CONCLUSIONS

The results demonstrated that our proposed CKG could completely and accurately capture the abstract CK under the concrete EHR data, and the pruning strategies could improve the knowledge accuracy of our CKG. The CKG has the potential to be applied to the DDS of diseases.

Collapse

Sakor A, Jozashoori S, Niazmand E, Rivas A, Bougiatiotis K, Aisopos F, Iglesias E, Rohde PD, Padiya T, Krithara A, Paliouras G, Vidal ME. Knowledge4COVID-19: A semantic-based approach for constructing a COVID-19 related knowledge graph from various sources and analyzing treatments' toxicities. WEB SEMANTICS (ONLINE) 2023;75:100760. [PMID: 36268112 PMCID: PMC9558693 DOI: 10.1016/j.websem.2022.100760] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 09/08/2022] [Accepted: 10/05/2022] [Indexed: 05/20/2023]

Affiliation(s)

Ahmad Sakor TIB Leibniz Information Centre for Science and Technology, Welfengarten 1 B, Hannover, Germany L3S Research Center, University of Hannover, Appelstraße 9a, Hannover, Germany
Samaneh Jozashoori TIB Leibniz Information Centre for Science and Technology, Welfengarten 1 B, Hannover, Germany L3S Research Center, University of Hannover, Appelstraße 9a, Hannover, Germany
Emetis Niazmand TIB Leibniz Information Centre for Science and Technology, Welfengarten 1 B, Hannover, Germany L3S Research Center, University of Hannover, Appelstraße 9a, Hannover, Germany
Ariam Rivas TIB Leibniz Information Centre for Science and Technology, Welfengarten 1 B, Hannover, Germany L3S Research Center, University of Hannover, Appelstraße 9a, Hannover, Germany
Konstantinos Bougiatiotis Institute of Informatics & Telecommunications, NCSR Demokritos, Patr. Grigoriou & Neapoleos Str, Ag. Paraskevi, Athens, Greece Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Panepistimiou 30, Athens, Greece
Fotis Aisopos Institute of Informatics & Telecommunications, NCSR Demokritos, Patr. Grigoriou & Neapoleos Str, Ag. Paraskevi, Athens, Greece
Enrique Iglesias TIB Leibniz Information Centre for Science and Technology, Welfengarten 1 B, Hannover, Germany L3S Research Center, University of Hannover, Appelstraße 9a, Hannover, Germany
Philipp D Rohde TIB Leibniz Information Centre for Science and Technology, Welfengarten 1 B, Hannover, Germany L3S Research Center, University of Hannover, Appelstraße 9a, Hannover, Germany
Trupti Padiya TIB Leibniz Information Centre for Science and Technology, Welfengarten 1 B, Hannover, Germany L3S Research Center, University of Hannover, Appelstraße 9a, Hannover, Germany
Anastasia Krithara Institute of Informatics & Telecommunications, NCSR Demokritos, Patr. Grigoriou & Neapoleos Str, Ag. Paraskevi, Athens, Greece
Georgios Paliouras Institute of Informatics & Telecommunications, NCSR Demokritos, Patr. Grigoriou & Neapoleos Str, Ag. Paraskevi, Athens, Greece
Maria-Esther Vidal TIB Leibniz Information Centre for Science and Technology, Welfengarten 1 B, Hannover, Germany L3S Research Center, University of Hannover, Appelstraße 9a, Hannover, Germany

Collapse

Ma C, Zhou Z, Liu H, Koslicki D. KGML-xDTD: a knowledge graph-based machine learning framework for drug treatment prediction and mechanism description. Gigascience 2022;12:giad057. [PMID: 37602759 PMCID: PMC10441000 DOI: 10.1093/gigascience/giad057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 05/05/2023] [Accepted: 07/04/2023] [Indexed: 08/22/2023] Open

Abstract

BACKGROUND

Computational drug repurposing is a cost- and time-efficient approach that aims to identify new therapeutic targets or diseases (indications) of existing drugs/compounds. It is especially critical for emerging and/or orphan diseases due to its cheaper investment and shorter research cycle compared with traditional wet-lab drug discovery approaches. However, the underlying mechanisms of action (MOAs) between repurposed drugs and their target diseases remain largely unknown, which is still a main obstacle for computational drug repurposing methods to be widely adopted in clinical settings.

RESULTS

In this work, we propose KGML-xDTD: a Knowledge Graph-based Machine Learning framework for explainably predicting Drugs Treating Diseases. It is a 2-module framework that not only predicts the treatment probabilities between drugs/compounds and diseases but also biologically explains them via knowledge graph (KG) path-based, testable MOAs. We leverage knowledge-and-publication-based information to extract biologically meaningful "demonstration paths" as the intermediate guidance in the Graph-based Reinforcement Learning (GRL) path-finding process. Comprehensive experiments and case study analyses show that the proposed framework can achieve state-of-the-art performance in both predictions of drug repurposing and recapitulation of human-curated drug MOA paths.

CONCLUSIONS

KGML-xDTD is the first model framework that can offer KG path explanations for drug repurposing predictions by leveraging the combination of prediction outcomes and existing biological knowledge and publications. We believe it can effectively reduce "black-box" concerns and increase prediction confidence for drug repurposing based on predicted path-based explanations and further accelerate the process of drug discovery for emerging diseases.

Collapse

Zou L, Bao W, Gao Y, Chen M, Wu Y, Wang S, Li C, Zhang J, Zhang D, Wang Q, Zhu A. Integrated Analysis of Transcriptome and microRNA Profile Reveals the Toxicity of Euphorbia Factors toward Human Colon Adenocarcinoma Cell Line Caco-2. Molecules 2022;27:molecules27206931. [PMID: 36296525 PMCID: PMC9608949 DOI: 10.3390/molecules27206931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 10/13/2022] [Accepted: 10/14/2022] [Indexed: 11/22/2022] Open

Affiliation(s)

Lingyue Zou Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350108, China
Wenqiang Bao Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350108, China
Yadong Gao Department of Toxicology, School of Public Health, Peking University, Beijing 100191, China Fujian Provincial Key Laboratory of Zoonosis Research, Fujian Center for Disease Control and Prevention, Fuzhou 350001, China
Mengting Chen Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350108, China
Yajiao Wu Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350108, China
Shuo Wang Department of Toxicology, School of Public Health, Peking University, Beijing 100191, China
Chutao Li Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350108, China
Jian Zhang Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350108, China
Dongcheng Zhang Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350108, China
Qi Wang Department of Toxicology, School of Public Health, Peking University, Beijing 100191, China Key Laboratory of State Administration of Traditional Chinese Medicine for Compatibility Toxicology, Beijing 100191, China Beijing Key Laboratory of Toxicological Research and Risk Assessment for Food Safety, Beijing 100191, China Correspondence: (Q.W.); (A.Z.)
An Zhu Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350108, China Correspondence: (Q.W.); (A.Z.)

Collapse

Foksinska A, Crowder CM, Crouse AB, Henrikson J, Byrd WE, Rosenblatt G, Patton MJ, He K, Tran-Nguyen TK, Zheng M, Ramsey SA, Amin N, Osborne J, Might M. The precision medicine process for treating rare disease using the artificial intelligence tool mediKanren. Front Artif Intell 2022;5:910216. [PMID: 36248623 PMCID: PMC9562701 DOI: 10.3389/frai.2022.910216] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 08/23/2022] [Indexed: 12/03/2022] Open

Nian Y, Hu X, Zhang R, Feng J, Du J, Li F, Bu L, Zhang Y, Chen Y, Tao C. Mining on Alzheimer's diseases related knowledge graph to identity potential AD-related semantic triples for drug repurposing. BMC Bioinformatics 2022;23:407. [PMID: 36180861 PMCID: PMC9523633 DOI: 10.1186/s12859-022-04934-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 09/16/2022] [Indexed: 11/10/2022] Open

Lee Y, Son J, Song M. BertSRC: transformer-based semantic relation classification. BMC Med Inform Decis Mak 2022;22:234. [PMID: 36068535 PMCID: PMC9446816 DOI: 10.1186/s12911-022-01977-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 08/11/2022] [Indexed: 11/13/2022] Open

Abstract

The relationship between biomedical entities is complex, and many of them have not yet been identified. For many biomedical research areas including drug discovery, it is of paramount importance to identify the relationships that have already been established through a comprehensive literature survey. However, manually searching through literature is difficult as the amount of biomedical publications continues to increase. Therefore, the relation classification task, which automatically mines meaningful relations from the literature, is spotlighted in the field of biomedical text mining. By applying relation classification techniques to the accumulated biomedical literature, existing semantic relations between biomedical entities that can help to infer previously unknown relationships are efficiently grasped. To develop semantic relation classification models, which is a type of supervised machine learning, it is essential to construct a training dataset that is manually annotated by biomedical experts with semantic relations among biomedical entities. Any advanced model must be trained on a dataset with reliable quality and meaningful scale to be deployed in the real world and can assist biologists in their research. In addition, as the number of such public datasets increases, the performance of machine learning algorithms can be accurately revealed and compared by using those datasets as a benchmark for model development and improvement. In this paper, we aim to build such a dataset. Along with that, to validate the usability of the dataset as training data for relation classification models and to improve the performance of the relation extraction task, we built a relation classification model based on Bidirectional Encoder Representations from Transformers (BERT) trained on our dataset, applying our newly proposed fine-tuning methodology. In experiments comparing performance among several models based on different deep learning algorithms, our model with the proposed fine-tuning methodology showed the best performance. The experimental results show that the constructed training dataset is an important information resource for the development and evaluation of semantic relation extraction models. Furthermore, relation extraction performance can be improved by integrating our proposed fine-tuning methodology. Therefore, this can lead to the promotion of future text mining research in the biomedical field.

Collapse

Sosa DN, Altman RB. Contexts and contradictions: a roadmap for computational drug repurposing with knowledge inference. Brief Bioinform 2022;23:6640007. [PMID: 35817308 PMCID: PMC9294417 DOI: 10.1093/bib/bbac268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/25/2022] [Accepted: 06/07/2022] [Indexed: 11/30/2022] Open

Schutte D, Vasilakes J, Bompelli A, Zhou Y, Fiszman M, Xu H, Kilicoglu H, Bishop JR, Adam T, Zhang R. Discovering novel drug-supplement interactions using SuppKG generated from the biomedical literature. J Biomed Inform 2022;131:104120. [PMID: 35709900 PMCID: PMC9335448 DOI: 10.1016/j.jbi.2022.104120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 04/26/2022] [Accepted: 06/08/2022] [Indexed: 12/04/2022]

Abstract

Objective:

Develop a novel methodology to create a comprehensive knowledge graph (SuppKG) to represent a domain with limited coverage in the Unified Medical Language System (UMLS), specifically dietary supplement (DS) information for discovering drug-supplement interactions (DSI), by leveraging biomedical natural language processing (NLP) technologies and a DS domain terminology.

Materials and Methods:

We created SemRepDS (an extension of an NLP tool, SemRep), capable of extracting semantic relations from abstracts by leveraging a DS-specific terminology (iDISK) containing 28,884 DS terms not found in the UMLS. PubMed abstracts were processed using SemRepDS to generate semantic relations, which were then filtered using a PubMedBERT model to remove incorrect relations before generating SuppKG. Two discovery pathways were applied to SuppKG to identify potential DSIs, which are then compared with an existing DSI database and also evaluated by medical professionals for mechanistic plausibility.

Results:

SemRepDS returned 158.5% more DS entities and 206.9% more DS relations than SemRep. The fine-tuned PubMedBERT model (significantly outperformed other machine learning and BERT models) obtained an F1 score of 0.8605 and removed 43.86% of semantic relations, improving the precision of the relations by 26.4% over pre-filtering. SuppKG consists of 56,635 nodes and 595,222 directed edges with 2,928 DS-specific nodes and 164,738 edges. Manual review of findings identified 182 of 250 (72.8%) proposed DS-Gene-Drug and 77 of 100 (77%) proposed DS-Gene1-Function-Gene2-Drug pathways to be mechanistically plausible.

Discussion:

With added DS terminology to the UMLS, SemRepDS has the capability to find more DS-specific semantic relationships from PubMed than SemRep. The utility of the resulting SuppKG was demonstrated using discovery patterns to find novel DSIs.

Conclusion:

For the domain with limited coverage in the traditional terminology (e.g., UMLS), we demonstrated an approach to leverage domain terminology and improve existing NLP tools to generate a more comprehensive knowledge graph for the downstream task. Even this study focuses on DSI, the method may be adapted to other domains.

Collapse

Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements. JOURNAL OF DATA AND INFORMATION SCIENCE 2022. [DOI: 10.2478/jdis-2022-0008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Abstract Abstract Purpose Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements. Design/methodology/approach Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels. Findings The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research. Research limitations The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE. Practical implications Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled. Originality/value We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements. Collapse

Lardos A, Aghaebrahimian A, Koroleva A, Sidorova J, Wolfram E, Anisimova M, Gil M. Computational Literature-based Discovery for Natural Products Research: Current State and Future Prospects. FRONTIERS IN BIOINFORMATICS 2022;2:827207. [PMID: 36304281 PMCID: PMC9580913 DOI: 10.3389/fbinf.2022.827207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 02/28/2022] [Indexed: 11/21/2022] Open

A Knowledge Graph Completion Method Applied to Literature-Based Discovery for Predicting Missing Links Targeting Cancer Drug Repurposing. Artif Intell Med 2022. [DOI: 10.1007/978-3-031-09342-5_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Balasubramanian V, Vivekanandhan S, Mahadevan V. Pandemic tele-smart: a contactless tele-health system for efficient monitoring of remotely located COVID-19 quarantine wards in India using near-field communication and natural language processing system. Med Biol Eng Comput 2021;60:61-79. [PMID: 34705163 PMCID: PMC8548353 DOI: 10.1007/s11517-021-02456-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 10/07/2021] [Indexed: 11/28/2022]

Mante J, Hao Y, Jett J, Joshi U, Keating K, Lu X, Nakum G, Rodriguez NE, Tang J, Terry L, Wu X, Yu E, Downie JS, McInnes BT, Nguyen MH, Sepulvado B, Young EM, Myers CJ. Synthetic Biology Knowledge System. ACS Synth Biol 2021;10:2276-2285. [PMID: 34387462 DOI: 10.1021/acssynbio.1c00188] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Henry S, Wijesinghe DS, Myers A, McInnes BT. Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest. Front Res Metr Anal 2021;6:644728. [PMID: 34250435 PMCID: PMC8267364 DOI: 10.3389/frma.2021.644728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/07/2021] [Indexed: 12/19/2022] Open

Malec SA, Wei P, Bernstam EV, Boyce RD, Cohen T. Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance. J Biomed Inform 2021;117:103719. [PMID: 33716168 PMCID: PMC8559730 DOI: 10.1016/j.jbi.2021.103719] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 12/31/2020] [Accepted: 01/04/2021] [Indexed: 10/21/2022]

Abstract

INTRODUCTION

Drug safety research asks causal questions but relies on observational data. Confounding bias threatens the reliability of studies using such data. The successful control of confounding requires knowledge of variables called confounders affecting both the exposure and outcome of interest. However, causal knowledge of dynamic biological systems is complex and challenging. Fortunately, computable knowledge mined from the literature may hold clues about confounders. In this paper, we tested the hypothesis that incorporating literature-derived confounders can improve causal inference from observational data.

METHODS

We introduce two methods (semantic vector-based and string-based confounder search) that query literature-derived information for confounder candidates to control, using SemMedDB, a database of computable knowledge mined from the biomedical literature. These methods search SemMedDB for confounders by applying semantic constraint search for indications treated by the drug (exposure) and that are also known to cause the adverse event (outcome). We then include the literature-derived confounder candidates in statistical and causal models derived from free-text clinical notes. For evaluation, we use a reference dataset widely used in drug safety containing labeled pairwise relationships between drugs and adverse events and attempt to rediscover these relationships from a corpus of 2.2 M NLP-processed free-text clinical notes. We employ standard adjustment and causal inference procedures to predict and estimate causal effects by informing the models with varying numbers of literature-derived confounders and instantiating the exposure, outcome, and confounder variables in the models with dichotomous EHR-derived data. Finally, we compare the results from applying these procedures with naive measures of association (χ2 and reporting odds ratio) and with each other.

RESULTS AND CONCLUSIONS

We found semantic vector-based search to be superior to string-based search at reducing confounding bias. However, the effect of including more rather than fewer literature-derived confounders was inconclusive. We recommend using targeted learning estimation methods that can address treatment-confounder feedback, where confounders also behave as intermediate variables, and engaging subject-matter experts to adjudicate the handling of problematic covariates.

Collapse

Zhang R, Hristovski D, Schutte D, Kastrin A, Fiszman M, Kilicoglu H. Drug repurposing for COVID-19 via knowledge graph completion. J Biomed Inform 2021;115:103696. [PMID: 33571675 PMCID: PMC7869625 DOI: 10.1016/j.jbi.2021.103696] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 12/23/2020] [Accepted: 02/01/2021] [Indexed: 02/07/2023]

Abstract

OBJECTIVE

To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods.

METHODS

We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach to identify drug candidates from PubMed and other COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative and accurate subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant. We used this subset to construct a knowledge graph, and applied five state-of-the-art, neural knowledge graph completion algorithms (i.e., TransE, RotatE, DistMult, ComplEx, and STELP) to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach.

RESULTS

Accuracy classifier based on PubMedBERT achieved the best performance (F1 = 0.854) in identifying accurate semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, Hits@1 = 0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as others that have not yet been studied. Discovery patterns enabled identification of additional candidate drugs and generation of plausible hypotheses regarding the links between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (i.e., paclitaxel, SB 203580, alpha 2-antiplasmin, metoclopramide, and oxymatrine) and the mechanistic explanations for their potential use are further discussed.

CONCLUSION

We showed that a LBD approach can be feasible not only for discovering drug candidates for COVID-19, but also for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions. Source code and data are available at https://github.com/kilicogluh/lbd-covid.

Collapse

Towards medical knowmetrics: representing and computing medical knowledge using semantic predications as the knowledge unit and the uncertainty as the knowledge context. Scientometrics 2021;126:6225-6251. [PMID: 33612884 PMCID: PMC7882417 DOI: 10.1007/s11192-021-03880-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 01/19/2021] [Indexed: 11/05/2022]

Abstract

In China, Prof. Hongzhou Zhao and Zeyuan Liu are the pioneers of the concept “knowledge unit” and “knowmetrics” for measuring knowledge. However, the definition on “computable knowledge object” remains controversial so far in different fields. For example, it is defined as (1) quantitative scientific concept in natural science and engineering, (2) knowledge point in the field of education research, and (3) semantic predications, i.e., Subject-Predicate-Object (SPO) triples in biomedical fields. The Semantic MEDLINE Database (SemMedDB), a high-quality public repository of SPO triples extracted from medical literature, provides a basic data infrastructure for measuring medical knowledge. In general, the study of extracting SPO triples as computable knowledge unit from unstructured scientific text has been overwhelmingly focusing on scientific knowledge per se. Since the SPO triples would be possibly extracted from hypothetical, speculative statements or even conflicting and contradictory assertions, the knowledge status (i.e., the uncertainty), which serves as an integral and critical part of scientific knowledge has been largely overlooked. This article aims to put forward a framework for Medical Knowmetrics using the SPO triples as the knowledge unit and the uncertainty as the knowledge context. The lung cancer publications dataset is used to validate the proposed framework. The uncertainty of medical knowledge and how its status evolves over time indirectly reflect the strength of competing knowledge claims, and the probability of certainty for a given SPO triple. We try to discuss the new insights using the uncertainty-centric approaches to detect research fronts, and identify knowledge claims with high certainty level, in order to improve the efficacy of knowledge-driven decision support.

Collapse

Biziukova N, Tarasova O, Ivanov S, Poroikov V. Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies. Front Genet 2021;11:618862. [PMID: 33414815 PMCID: PMC7783389 DOI: 10.3389/fgene.2020.618862] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 11/26/2020] [Indexed: 12/16/2022] Open

Abstract

Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts of publications represent a primary source of information, which is especially important to collect the data of the highest quality due to the immediate obtaining information, in comparison with databases. In our study, we aimed at the development and testing of an approach to the named entity recognition in the abstracts of publications. More specifically, we have developed and tested an algorithm based on the conditional random fields, which provides recognition of NEs of (i) genes and proteins and (ii) chemicals. Careful selection of abstracts strictly related to the subject of interest leads to the possibility of extracting the NEs strongly associated with the subject. To test the applicability of our approach, we have applied it for the extraction of (i) potential HIV inhibitors and (ii) a set of proteins and genes potentially responsible for viremic control in HIV-positive patients. The computational experiments performed provide the estimations of evaluating the accuracy of recognition of chemical NEs and proteins (genes). The precision of the chemical NEs recognition is over 0.91; recall is 0.86, and the F1-score (harmonic mean of precision and recall) is 0.89; the precision of recognition of proteins and genes names is over 0.86; recall is 0.83; while F1-score is above 0.85. Evaluation of the algorithm on two case studies related to HIV treatment confirms our suggestion about the possibility of extracting the NEs strongly relevant to (i) HIV inhibitors and (ii) a group of patients i.e., the group of HIV-positive individuals with an ability to maintain an undetectable HIV-1 viral load overtime in the absence of antiretroviral therapy. Analysis of the results obtained provides insights into the function of proteins that can be responsible for viremic control. Our study demonstrated the applicability of the developed approach for the extraction of useful data on HIV treatment.

Collapse

Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials Using NLP-ADAPT. Artif Intell Med 2020. [DOI: 10.1007/978-3-030-59137-3_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]